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ESTIMATION BY THE MINIMUM DISTANCE METHOD IN 
NONPARAMETRIC STOCHASTIC DIFFERENCE EQUATIONS: 


By J. WoLrow1rz 
Cornell University 


1. Introduction. The present paper is intended to report some of the ideas 
described in a special invited address delivered by the author at the meeting of 
the Institute of Mathematical Statistics at Chicago on December 29, 1952. This 
address dealt with two topics: a) the connection between the method of maximum 
likelihood and the Wald theory of decision functions, with an explanation of the 
asymptotic efficiency of the former; and b) estimation by the minimum distance 
method. The first of these topics is discussed in [1], and this paper will be devoted 
to a discussion of the second. 

The origin of the minimum distance method is to be found in [2]. Applications 
of the method were extended and generalized in [3]. The paper [4] contains a 
theorem which is an essential tool. A paper by Kac, Kiefer, and the present 
author, entitled “On tests of normality and other tests of goodness of fit based 
on the minimum distance method,” is in preparation. 

The method of estimation to which this paper is devoted is characterized by 
the fact that the estimators are always such as to minimize the distance between 
suitably chosen distribution functions (d.f.). In a variety of problems, which 
includes many where classical methods, like that of maximum likelihood, fail 
to give consistent estimators, it yields estimators which actually converge with 
probability one to the quantities being estimated; we call such estimators super 
consistent. The problems treated in the present paper provide examp!es of this. 

The basic ideas of the proofs of the super-consistency of these estimators are 
to be found in [2] and [4]. Application of the minimum distance method, unlike 
that of the method of maximum likelihood, is not mechanical, and, in the cases 
we have treated, always requires the development of special results. 

The present paper presents results on problems not hitherto treated in the 
literature. It is intended to be largely self-contained, and its organization is as 
follows. Section 2 gives essential preliminaries. Section 3 contains a statement of 
some of the results already obtained elsewhere. In Section 4 are formulated three 
new problems in nonparametric stochastic difference equations. In Sections 5, 
6, and 7 we exhibit minimum distance estimators for these problems. In Sections 
5 and 6 we prove the super-consistency of the first two estimators. 

In a few places the proofs are not given in all detail in the interest of brevity, 

Received 6/24/53 

! This research was supported in part by the United States Air Force under Contract 
No. AF18(600)-685 monitored by the Office of Scientific Research. 

Editor’s Note: This paper was presented to the Chicago meeting of the Institute of 


Mathematical Statistics, December 29, 1952, and is published in the Annals by invitation of 
the Institute Committee on Special Invited Papers 


203 





204 J. WOLFOWITZ 


but sufficient detail is given to exhibit the fundamental ideas and spirit of the 
method. Places where the proofs below are not given in full detail are: Section 
5, in the paragraph containing (5.10) and in the following paragraph; Section 6, 
for equations (6.3), (6.6), and (6.10), and in the paragraph following the one 
containing equation (6.10). At these points references to [2] and [4] are given 
where the reader will find similar theorems completely proved; a study of these 
proofs will enable him to reconstruct the missing points in all detail. The spirit 
of these results is discussed below when we discuss the basic ideas of the method. 
The proof of the result of Section 7 is omitted because it is easier than, and so 
much like, the proofs of Sections 5 and 6. Section 8 consists of concluding re- 
marks. 

The author is very grateful to Professor J. L. Doob for several helpful dis- 
cussions while this paper was being written. Professors L. Hurwicz, T. Koopmans, 
and J. Marschak were very kind in answering the writer’s questions about the 
literature and problems of stochastic difference equations. 

The author wishes to take this opportunity to apologize for the inclusion, in 
the paper [2], of its Section 10. This section was by way of an incidental remark 
and had nothing to do with the minimum distance method. The idea of this 
section, as was kindly pointed out to the author by Professor W. Kruskal, was 
previously employed by Geary [5]. 


2. Essential preliminaries. Let s; , 8. , --- s, be k numbers. By their empiric 
d.f. we mean a function, say S(x), such that kS(x) is equal to the number of 
these numbers 8; , --- , s which are less than x. Let (s; , 4), --- , (s%,t), be k 
couples of numbers. By their empiric d.f. we mean a function, say S(z, y), 
such that kS(z, y) is the number of couples (s;, t;), 7 = 1, ---, k, such that 
s; < x and t¢; < y. Similar definitions apply in higher dimensions. 

Let (Z;, Z:) be a pair of chance variables. Their d.f. G(z, y) is P{Z,; < zx 
and Z, < yj}, where P{ | denotes the probability of the relation in braces. 
Similar definitions apply in one and higher dimensions. 

We stress here, as we have done in our previous papers, that our method 
does not depend upon any particular definition of distance, and is applicable 
with very many definitions. One of the problems requiring investigation is, in 
fact, to determine which definition will yield better results, and in what sense. 
Failing such knowledge, we will adopt the Fréchet distance which is mathe- 
matically convenient and not otherwise unreasonable. 

Let S,(x), So(x) be a pair of d.f.’s. The distance 5(S; , S.) between them will 
be defined by 


5(S, " S») = sup | S1(x) —_ S2(x) | « 


2 The notion of a metric space is due to Fréchet, so that in a large sense every distance is 
a Fréchet distance. We shall adopt the customary designation of the distance 6 between two 
distribution functions as the Fréchet distance. 
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Similarly, the distance 6 between the d.f.’s S;(a, y) and S,(x, y) will be defined by 


6(S,, S,) = sup | S;(z, y) — Sy(a, y) 
ry 


Similar definitions apply in higher dimensions. Let K be a class of d.f.’s. The 
distance 5(S, , K) of the d.f. So from the class K will be defined by 


5(So 9 K) = nf 5(So . kK’). 


1 
K'eK 
Let Y,, Y2, --- be a sequence of independent, identically distributed chance 
variables with the common d.f. G(x). Let G(x) be the empiric d.f. of Y;,--- , Y, 
The theorem of Glivenko-Cantelli ({8], page 260) states: 
(2.1) P } lim 6[G(x), G(z)] = 0} = 1. 
Let {Yij},7 = 1,--:,m:,t = 1,2, --+ ad inf., be independently distributed 
chance variables. Let G:(x) be the common d.f. and GT(x) be the empiric df. 
of Ya, ---, Yim, . Define 


> m; G;(x) > m; G* (x) 


and "6s o Daaiers. 


n 
i mi 


im1 
An important tool in some applications of the minimum distance method is 
the following result (proved in [3]): 
(2.2) P {lim 4{G"(x), G""(x)| =O} = 1. 


(The approach to the limit in (2.2) is actually uniform in the G’s; see Theorem 
4.2 of [3)). 

Let {Yj},i = 1, ---,k;7 = 1,2, --- , ad inf., be a sequence of independent 
chance variables such that, for each i, {Yj}, 7 = 1, 2, --- , ad inf., all have the 
same d.f. Let gq = (q:, --- , qe) be any k real parameters. Let G(x | q) be the 
d.f. of Etet qi Yi, and G,(x | q) be the empiric d.f. of 


(ok 


) 
(> a Y3), gm i++ @ 


(om 


Another important tool in the application of the minimum distance method is 
the following result (first proved in [4]): 


(2.3) P } lim sup 8[G(z | q), Ga(a | g) = 0} = 1. 
qd 


n-*o 


Actually this theorem is valid under much weaker hypotheses, and at the end 
of [4] there is given a prescription for proving this result under weaker condi- 
tions with essentially the same proof. In the new applications of the minimum 
distance method which we shall make later in this paper, we will actually make 
essential use of this theorem under several sets of weaker conditions. It should 
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be noticed that this theorem does not merely say that the d.f. G,(x | q) converges 
to the df. G(x|q) uniformly in q (actually the convergence of G,(x | q) to 
G(x | q) is not only uniform in q but actually uniform in G (see the proof of 
Theorem 4.2 of [3])). The theorem actually says that the convergence is simul- 
taneous for all g from — « to +, which is considerably more than uniformity. 


3. Some previous results obtained by the minimum distance method. In this 
section we describe a few results already obtained, together with some heuristic 
considerations underlying them. We shall sometimes forego full generality in the 
interest of clarity of exposition. 

Let [Xi;},7 = 1, +--+ ,n;j7 = 1, «++ , m; be independently distributed chance 
variables. (In [3] we discussed also the case where the chance variables are 
not independent). Let F’;(x | 6, a:) be the d.f. of Xi, --- , Xim, . The parameters 
6 and a; upon which this d.f. depends are unknown; for simplicity we take them 
to be scalars although our results are equally valid for vectors. The parameter 
6 occurs in every group of X’s (Xa, --: , Xim, constitute the 7th group) and 
was called by Neyman and Scott [6] ‘‘structural.’’ The parameter a; occurs only 
in the 7th group and was called “incidental.” 

Let T' be a (given) set within which @ is known to lie (of course 7’ may be the 
whole line). Write 


&, = (ay, @,°*** , An) and & = (ay, Ola, ***). 


Let A,, be the set within which @, is known to lie, and A the set within which @ 


is known to lie. Let F'(x) be the empiric distribution function of 
Xi, Xin, +++ , Xim,, and define 


> m; F* (x) 
B’(z) = ——- 
> m. 
=i 


/ , , . 
Let &, = (a,, °°: ,a,), and define 


r 


, / ay 
7 mF (a | 0, ai) 
a | 

n 

7 Mm, 


i=1 


C"(x | 0’, &» 


+ * . ° ° . , 
Let 0%, ain, °**, @nn be Borel-measurable functions of Xi, --- , Xam, 
one * * * * " + 
such that (writing a, = (ain, ***, @nan)) 0,€7, a,€A,, and 


| ‘ " a oie " 
(3.1) 5[C” (x 0”. > a’), B" (x)| <-+ inf alc (x | 0’,a,), B (x)]. 


n 0 eT, ajeAn 


The estimator 6% is a minimum distance estimator. Under a reasonable 
. . ° ° ° « x ° ° ons 
restriction it is proved in [3] that @, is a super-consistent estimator of 6. The 
basic ideas of this simple proof are as follows: 
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i) From (2.2) if follows that 


(3.2) P } lim &{C" (ax | 6, &n) B’(x)| = Of = 1. 


Hence from the definition of 6* it follows that a fortiori 


(3.3) P flim a{(C*(x | 6%, a), B'(z)] = O} = 1. 


ii) If 6% differs appreciably from @ then the distance 
(3.4) a[(C "(x | 0%. , as), C*(x | 0, Ha)! 


is appreciable. This is essentially the postulated restriction. 

iii) Equations (3.2) and (3.3) imply that the distance (3.4) is almost always 
small for large n. Hence 6°, cannot differ appreciably from @ for large n. 

We remind the reader that the above is only a heuristic outline of the proof, 
and also that the final result is a limiting property which holds with probability 
one. 

Consider now the following problem: Let — = & , &, --- ad inf. be an infinite 
sequence of constants which are unknown to the statistician. Let a and 8 be 
parameters unknown to the statistician. Let (u;, v;), 7 = 1, 2, --- , ad inf., be 
a sequence of identically, independently, and jointly normally distributed pairs 
of chance variables, which the statistician cannot observe. The means of 1; 
and », are known to be zero; their covariance matrix is unknown. Let the ob 
servable chance variables be (x; , y;), 7 = 1, --- , n, where 


w=& tu; and yj =at B+ 0;. 


The problem is to give consistent estimators of a and 8. 

Let c, and c. be any real numbers and A,(x | c , ¢:) be the empiric d.f. of 
fy; — cy — cox,} fori = 1, --- , n. Let N* be the class of all normal d.f.’s with 
mean zero. Define a, and b, as any Borel-measurable functions of the arguments 


M1, °*', In, Yr, °**, Yn, Such that 


5[A,(x | a, , bn), N*] < + inf 8[A,(x! 1, c), N*). 
i 


€1.¢2 


It is proved in [3] under reasonable restrictions on the sequence £ that a, and 
b, are super-consistent estimators of a and 8, respectively. The basic ideas of 
this proof are as follows. 

i) From (2.1) it follows that 


(3.5) P {lim 4[A,(z | a, 8), N*| = 0} = 1. 


n-*@m 
Hence a fortiori we have 


dn, bn), N*| = 0} 


' 


(3.6) P } lim 4[A,(2 | 
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ii) One proves that 


P 4 lim 6 | 4 n(x | dy, b,), 1 y os N(x | (a — ay) + (8 — bdy)E:, o*(bn) | 


no TL inwl 


(3.7) 


where N(x | d; , d:) is the normal d.f. with mean d,; and variance d; , 
2 2 : 22 
ao (c) = a2 — 2cpoyo2 + Co}, 
2 2 ’ 
Ev" = a2, Ew = pojoe. 


iii) One proves that, if |a — c| + |8 — c| is appreciably different from 
zero, then the distance from N* of 


(3.8) n* ¥ N(x | (a — «) + (8 — enéi, o7(c2)) 
t= 

is appreciably different from zero. 

From i, ii, and iii one concludes that a, approaches a and b, approaches 8. 

The postulated restrictions on £ are such as to enable us to draw conclusions 
ii and iii. Meager restrictions suffice for this. In particular, if &, &, --- are 
independent observations on a chance variable whose distribution is not normal 
(this is the case treated in [2]), these restrictions are satisfied with probability 
one. The conclusion of iii is proved by a compactness argument. In proving 
equation (3.7) one uses an argument similar to that used to prove (2.3) and 
first proves 


Phim sup 6 | Ant | cx, €2), n* Zz N(x | (a — ey) + (8 — ead; , (ed) | 


nO €),¢9 


(3.9) 


From this (3.7) follows easily. The result (3.9) is much deeper and more difficult 
to prove than the result (2.2). One cannot obtain (3.7) directly from (2.2) 
because a, and b, are functions of 7, °---,2%n, %, °** , Yn and not constants. 
The relation (3.9) says not merely that (2.2) holds in this particular set-up 
uniformly in ¢, ¢, but actually simultaneously for all pairs c, , ce: , which is 
considerably more than uniformly. This is essentially the relationship between 
(2.1) and (2.3). Mere uniformity is easy to prove but it is not what is needed. 
In general, the proof of the super-consistency of a, and b, is much more 
difficult and elaborate than the corresponding proof for 6%, , chiefly in the need 
for proving (3.9). The operational reason seems to be the following: When 
estimating @ one has a definite empiric d.f. at his disposal, (B,(x)), and compares 
it with the “true” d.f. and the nearest d.f. When estimating a and 6 one has 
to adjust the empiric d.f. (A,(z | c; , ¢2)) until its distance from a sum of normal 
distributions which themselves depend upon the empiric d.f. is least. In the new 
problems treated below one obtains the estimator by varying a parameter until 
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two empiric d.f.’s which depend upon it are closest together. One could then 
anticipate correctly that the proof of super-consistency will be more compli- 
cated as a result. 


4. Statement of the new probiems. In all that follows the indices 7 and n are 
to run through all the positive integers, and the index 7 is to run through all 
the integers, unless the contrary is explicitly stated. The sequence {u;, v,;} 
will always be a sequence of independent chance variables. All the u’s are to 
have a common distribution, and all the v’s are to have a common distribution. 
To avoid the trivial it will be assumed throughout this paper that neither 1, 
nor v is constant with probability one. The chance variables {u; , v;} are statis- 
tically nonobservable variables. This means, mathematically speaking, that the 
estimators we shall construct will be functions of other (observable) variables, 
which will always be denoted by 2; . 

ProBLEM A. Suppose 


(4.1) Xi = Uy + aty_y 


with @ an unknown constant which may be any number less than one in absolute 
value. No assumption whatever will be made on the distribution of u, . The prob- 
lem is to estimate the parameter a; for all n we are to construct Borel measurable 
functions a,(a, --* , %n) such that a, — a@ at least in probability. (Ours will 
converge to a with probability one). 

ProsLeM B. Suppose 8 is an unknown constant which may be any number 
less than one in absolute value (other than zero), and 


(4.2) Yi = BYi-n + Ui. 
We wish this process to be stationary. It is easily seen that this implies that 


(4,3) y= dD Br eu;. 


j=—oO 


Some assumption has now to be made so that the series in (4.3) will converge 
with probability one. Now let 
(4.4) t= Yi tv. 


The problem is to construct estimators of the parameter 6, that is, Borel meas- 
urable functions b,(a, --- , %n) for all nm such that b, — 8; our estimators will 
converge with probability one. 

ProspieM C. Suppose y is an unknown constant which may be any number 
less than one in absolute value, and 


(4.5) Li = Win + UY. 


i- 


The chance variable x») is chosen so as to make the process {x;} stationary. It 


is easily seen that this implies that 


(4.6) = - yo u;. 


j=u—wo 
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We shall assume that 4.6 converges. The problem is to construct estimators of 
the parameter y, that is, Borel-measurable functions g,(2,,---, x,) for all n 
such that g, —> 7; our estimators will converge with probability one. 

Problems involving several simultaneous equations of higher order or problems 
of greater difficulty can of course also be treated by our minimum distance 
method. It seems to the author, however, that the attendant complications 
would obscure the essential points of the method. This explains our choice of 
problems. 

5. Problem A. For convenience we will suppose in this section that the 
number of observations |z,} is odd and equal to 2n + 1. Thus we will construct, 
for every positive n, a function a,(2,, «++ , Zen41) of the arguments exhibited. 

Let a be a real parameter which, in this section, will always be less than one 
in absolute value, and A,(x\a) be the empiric df. of {2; — ax,;,} for 


t= 2,---,2n + 1. Define 
(5.1) B,(z, y|a) = A,(x | a)-A,(y | a). 


Let C(x, y a) be the bivariate empiric d.f. of the pairs 


| (Xen. — AXex), (%2i — AX) } 


Let a, be any Borel-measurable function of a, «++ , Zens, such that 
and 


‘ | 
(5.2) daB,(x, y | a), Cala, y | and] << —+ inf dB,(a, y | a), Caz, y | a)). 
aj<l 


nt 
THEOREM |. We have 


sg i 

(5.3) P} lim a, = aj = 1 
n-*o 

that is, a, 18 a super-consistent estimator of a. 


The remainder of this section will be devoted to a proof of Theorem 1. Define 
(5.4) wa) = 2; ar;. = ui + (a — au; Aau;_». 


Hence 


(2.0) Willa) = Uu— au; 


" 


We see that w,(a) and wy, (a) are independently distributed whenever | 7 — 1 
Also, for any 7, w;(a) and w,;:4:)(@) are independently distributed. 

Let H(x a) be the d.f. of wi(a). Let Aia(x | a), i = 1, 2, 3, be the empiric df. 
of all w,(a) such that 2 S 7 S 2n + 1, andj = 7 (mod 3). Thus each A,,(z | a) 
is the empiric d.f. of independently distributed chance variables (w,(a)), each of 
which is the same linear combination of independently (within each sequence) 
distributed u,’s. From our generalization [4] of the theorem of Glivenko-Cantelli 
we obtain that 


(5.6) P }lim sup 4[A;,(a | a), H(a | a)) = Of = 1 fori = 1,2,3, 


no \a 
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Hence 


(5.7) P } lim sup 8[A,(x | a), H(z | a)] = Of = 
ne jal<l 


Let 
(5.8) D(x, y|a) = H(x|a)-H(y | a). 
From (5.7) we obtain 


(5.9) P }lim sup 6[B,(z, y | a), D(z, y | a)| = 0} = 1. 
<1 


Let E(x, y| a) be the d.f. of (ws(a), we(a)). Then the infimum of 
i[E (a, y | a), D(a, y | a)) 


in the domain {|/a| < 1,|a—a|2d> 0} is, say, l(d) > 0. This is proved by 
a compactness argument based on the following two facts: 
i) If l(d) = 0 then, for some number ap with | a) — a| 2 dand|a\ S 1, 


(5.10) E(x, y | a) = D(a, y | a), 


ii) But this cannot hold because, when (a — a)(1 — | aa |) # O (as is surely 
the case for a = do), w3(a) and w,(a) are not independently distributed, as (5.8) 
would then imply. To show the latter we employ the following argument. Let 
y(t) be the logarithm of the characteristic function of u,;. This is well defined 
in a neighborhood of the origin, which is the only place where we will require 
g(t); we use that branch of the function g(t) for which g(0) = 0. The independ- 
ence of w;(a) and w.(a) would imply that, in a neighborhood of the origin, 


¢(la — als + t) + ¢(—aas + [a — alt) 
= g([a — als) + g(t) + o(—aas) + ¢([a — alt). 
If now 
¢([a — als + t) = gf[a — als) + oft) 


for all s and ¢ in a neighborhood of the origin, then ¢(s) = cos in a neighborhood 
of the origin, where cy is a constant. Since exp{g(s)} is a characteristic function 
at least for small | s |, it follows that c is purely imaginary and hence that the 
characteristic function of um is exp{cos} for all s. This violates the assumption 
that “4 is not constant with probability one and proves the desired result. 
We now employ the following argument due to J. L. Doob. Define (always only 
in a neighborhood of the origin) the function 


¥v(s, 1) = go(s + t) — o(s) — oft). 
Then p(s, 4) is continuous and f(s, t) = W(t, 8). Also ¥(0, 1) = 0 and 


¥({a — als, t) + ¥(—aas, [a — all) = 0. 
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= —¥(—aala — als, [a — alt) 
= —¥([a — alt, —aala — a}”'s) 


¥(—aat, —aas) = ¥(—aas, —aat). 


Now always | aa| < 1 since | a| < 1. For every positive integer n we have 
¥(s, t) = ¥((—aa)"s, (—aa)"t). 


Hence y(s, t) = 0 in a neighborhood of the origin, so that the proof is complete. 


Essentially as in [4], making use of the fact that each w; is a linear combina- 
tion of independent w’s, one can prove that 


(5.11) P | lim sup 4{C,(2, y | a), E(x, y | a)] = Of = 1. 
ne jal<l 

The facts cited in the last two paragraphs are basic to our proof of convergence. 
Theorems corresponding to them are proved in [2] and [4] and cited below for 
our other problems. The method of proof for new problems will consist in part of 
choosing suitable d.f.’s for which one can assert similar theorems. The proofs 
of the present theorems require considerable detail, but can be constructed by 
the reader who understands the ideas of the proofs in [2] and [4]. They are 
omitted here and in subsequent sections of this paper because their detailed 
exposition would make this paper inordinately long for both reader and writer. 


Suppose now that the theorem is not true. Then there exist positive d, and d, 
such that 


(5.12) P | lim sup | a, — a| > dy} > do. 


Hence 


(5.13) P} lim sup d[E(z, y | a,), D(x, y | an)] 2 Udy} > de. 


no 


From (5.11) we obtain 


(5.14) P } lim 8(C,(a, y | an), E(x, y | an)] = Of = 


n-?*@o 
From (5.13) and (5.14) we obtain 


(5.15) P } lim sup 4[C,(z, y | an), D(z, y | an)] 2 Udi)} > de. 


n-?*@ 


From (5.9) and (5.15) we obtain 


(5.16) P } lim sup 4[B, (2, y | an), Caz, y | an)] 2 Ud)} > de. 


n-?*@ 
Since w,(a) and w,4;(@) are independently distributed we have 


(5.17) E(x, y|a) = D(a, y | a). 
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From (5.9) therefore we obtain 


(5.18) P {lim a[B,(z, y | a), E(z, y | a)) 


na 


From 5.11) and (5.18) we obtain 


(5.19) P { lim 8{B,(z, y | a), Ca(x, y | a] : 


From (5.2) and (5.19) we obtain 


(5.20) P { lim 4{B,(z, y | aa), Ca(z, y | an)) = 0} = 1. 


The contradiction between (5.16) and (5.20) proves Theorem 1. 


6. Problem B. For convenience we will assume in this section that the number 
of observations {z,;} is 4n + 1. Thus we will construct, for every n, a function 
b,(t1, *** , Zan41) Of the arguments exhibited. 

Let b be a real parameter which throughout this section will be assumed to 
be less than one in absolute value. Let 


m,(b) ski bays = Ui + (v; 7. buys) + (8 7 b)yia . 


If |i — 7’ | = 2 then m,(8) and m,(8) are independently distributed. If b # 8 
and |i — 7’ | 2 2 then m,(b) and m,(b) are not independently distributed. 
Let A,(x | b) be the empiric d.f. of m2(b), «++ , man4i(b). Define 


B,(x, y |b) = An(x | b)Aa(y | d). 
Let C,(x, y | b) be the bivariate empiric d.f. of the 2n pairs 
(me(b), m4(b)); (m3(b), ms(b)); (me(b), ms(b)); 
(mz(b), mo(b)); --- 5 —— (mana(), ™Man41()). 


Let b, be any Borel measurable function of x, --- , Zan41 such that |b,| < 1 
and 


(6.1) d{B,(a, y| 6»), Cala, y| ba)] < : + inf 6{B,(z, y| 6), C.(a, y | 6). 
b 


We shall now sketch the proof of 

THEOREM 2. We have 
(6.2) P { limb, = 6} = 1. 

For any b, the sequence {m,(b)} fori = 2, 3, --- , ad inf., is a sequence of 
stationary chance variables. Moreover, each m,(b) is a (stationary) linear com- 
bination of u’s and wv’s which are all independently distributed. Hence the 
stochastic process {m,(b)} is metrically transitive, for any b. Making use of the 
ergodic theorem we obtain without difficulty that the conclusion of the Glivenko- 
Cantelli theorem ({8], page 260) holds for the sequence {m,(b)}, whatever be b. 
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Let H(z | b) be the d.f. of m,(b). Using the methods of [2] and [4] one can prove 
that 


(6.3) P { lim sup [A,(2|b), H(z} b)| = 0} = 1. 
b 


no 


Let 
(6.4) D(a, y |b) = H(x|b)-H(y|b). 
From (6.3) we obtain 


(6.5) P { lim sup 6[B,(z, y |b), D(a, y| b)} = 0} = 1. 


nam b 
Let C,,(2, y | b) be the bivariate empiric d.f. of the pairs 
(m2(b), m4(b)); (me(b), ms(b)); (my(b), my(b)); + -- 5 (M4n—2(b), ™4n(b)). 
and (,,(a, y | b) be the bivariate empiric d.f. of the pairs 
(m;(b), ms(b)); (m7(b), me(b)), «** ; (M4n—1(b), Man41(b)). 


When |i — i’ | 2 2, m,(8) and m;,(8) are independently distributed. Hence 
from the extension to the present case of the bivariate Glivenko-Cantelli theo- 
rem we obtain 


(6.6) P } lim a[C,, (2, y| 8), D(z, y| B)] = 0} = 


n 


Hence 


(6.7) P { lim 4{C, (2, y | 8), D(a, y | B)) 


n> 


From (6.5) and (6.7) we obtain 


(6.8) P { lim 6B, (2, y| 8), Ca(z, y|8)| = 0} 


From (6.1) and (6.8) we obtain 


(6.9) P { lim &{B,(a, y |b»), Ca(z, y) | b»)] = 0} = 1. 


Using the methods of [2] and [4] it can be proved,’ although considerable 
detail is required, that, fori = 1, 2 


(6.10) P { lim sup 4[C,,(x, y | b), E(a, y | b)) 
n-+x b 


where E(x, y | b) is the d.f. of the pair (m2(b), m4(b)). From (6.10) we obtain 


(6.11) P { lim sup 4[C, (a, y| 6), E(w, y|b)| = Of = 1. 


n—o b 


‘This illustrates the fact that the result of [4] does not require for its validity the in 
dependence of the chance variables. It is actually valid under much weaker conditions and 
obviously can be extended to multivariate distributions 
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By a compactness argument similar to that of Lemma 2 of [2] one can prove that 
the infimum of 6{E (x, y | b), D(x, y | b)] in the domain {| b| < 1,)b —8| 2d>0} 
is, say, ,(d) > 0. (The reader will have noticed that E(x, y | 8) = D(z, y | 8), and, 
for b ¥ 8, that d[E(a, y | b), D(z, y | b)) > 0.) 

Suppose now that Theorem 2 is not true and there exist positive d, and d, 
such that 


(6.12) P | lim sup |b, — B| > di} > de. 


n-?ao 
Hence 


(6.13) P { lim sup §[E(z, y | b,), D(a, y|b,)] 2 ld} > de. 


From (6.11) and (6.13) we obtain 


(6.14) P { lim sup 8[C, (2, y| b,), D(a, y|b.)] 2 L(d)} > de. 


n-*o 
Together with (6.5) this yields 


(6.15) P { lim sup 4[B, (2, y | ba), Ca(z, y| bn)| 2 Lildy)} > de. 


n-*e 


The contradiction between (6.9) and (6.15) proves Theorem 2 


7. Problem C. For convenience we will assume in this section that the number 
of observations {2,} is odd and 2n + 1, say. Thus we will construct, for every n, 
a function g,(x,, °** , Yen41) of the arguments exhibited. 

Let g be a real parameter which throughout this section will be assumed to 
be less than one in absolute value. Let 


(7.1) gig) = % — gti = Ui t+ (y — grin. 


The chance variables {g;(y)} are all independent of each other. If g # y and 
i ~ v’ then gg) and q¢,(g) are not independently distributed. 
Let A,(x |g) be the empiric d.f. of go(g), gs(g), --° , Genail(g). Define 


B,(2,y |g) = An(x|g):Anly |g). 
Let C,,(x, y |g) be the bivariate empiric d.f. of the pairs 
(q2(g), qa(g)); (qa(g), G6(g)); -** 5 (Gan(9), an4i(g))- 


Let g, be any Borel-measurable function of 2, --~- , 224; such that |g, | < 1 
and 


a : ; l 7 
(7.2) 5B,(a2, y | gn), Cala, y | gn)i < — + inf 6[B,(x, y |b), C(x, y | 6)). 
n x 


Then, in a manner similar to that of preceding sections, one can prove that 


P{limg, = y} = 1. 
7 


n 





216 J. WOLFOWITZ 


8. Conclusion. What the “‘practical’”’ value of the minimum distance method 
is is very unclear at present to the writer. For example, the method enables one 
(Section 3 or [2] and [3}) to fit a straight line when both variables are subject 
to normal‘ errors, under an assumption (on £) so weak that the very pretty 
result of Reiersgl [9] is an immediate consequence. (Reiers¢l’s theorem states 
that, if the &, &, --- of Section 3 are independent chance variables with a 
common distribution function which is not normal, then a and £ are identified). 
However, if one assumes that any cumulant of order not less than three of the 
common distribution is not zero—an assumption to whick many practical 
people would not object—one can, using Geary’s method ((5] or [2], Section 10) 
expeditiously obtain consistent estimators of a and 8. It might therefore be 
argued that the difficulty of the problem is due solely to insistence on mathe- 
matical generality and aesthetics, and disappears when one is willing to make 
practical assumptions. The same argument could be made about the problems 
described in Section 4 of this paper; if one assumes second moments to exist 
one can, without any difficulty, obtain consistent estimators. 

It seems to the writer, however, that the minimum distance method is of 
interest precisely because it enables one to solve a class of problems which 
cannot be solved by classical methods, and to do this in a manner which seems 
very reasonable and suggestive. The problems need not be solely problems of 
estimation but may also be problems of testing hypotheses. Thus (see [2], 
page 149) suppose one wishes to test the hypothesis that the common distribu- 
tion function of the independent chance variables z,, --- , z, is normal. One 
could base this test on 6(Z, , N**), where Z,(zx) is the empiric distribution func- 
tion of z,, ---, 2,n, and N** is the class of all normal distribution functions. 
Also there is no doubt that the minimum distance method is useful in the solu- 
tion of many identification problems (for a discussion of identification problems 
see Koopmans [7]). Reiers¢l’s theorem and other problems of [3] and the present 
paper are cases in point. It is the author’s opinion that the minimum distance 
method will also be useful in the treatment of many nonparametric problems. 

An important general problem is to find a method of full generality which 
will yield efficient estimators of structural parameters in the case where each 
new set of observations depends also upon another incidental parameter. The 
solution of this problem is at present unknown. Neyman and Scott [6] have 
shown that the method of maximum likelihood does not always yield efficient 
or even consistent estimators. The minimum distance method as employed 
in [3] (briefly described in Section 3 of the present paper) yields consistent 
estimators in rather wide generality; its efficiency remains to be determined. 

If an efficient estimator does not exist the problem would seem to be to char- 
acterize the complete class of estimators. One should not a priori preclude the 
possibility of employing some reasonable measure of efficiency other than the 


* Actually, as pointed out in [2], the errors need not be normal, nor need the linear re- 
lation be in two dimensions only 
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usual one. If most or many consistent estimators are not normally distributed 
this may be advisable. 

Among statistical methods which employ the idea of distance is the one for 
which Kolmogoroff and Smirnoff obtained many asymptotic distributions and 
for which Wald and the present writer obtained small sample results (for a 
description and references see, for example, Birnbaum [10]). Suppose, for ex- 
ample, that one wishes to test the simple hypothesis that the distribution func- 
tion of x independent, identically distributed chance variables is a given distribu- 
tion function F(x). The Kolmogoroff-Smirnoff test is based on the Fréchet 
distance between F(x) and the empiric distribution function of the chance 
variables. There is no minimization of distance in the Kolmogoroff-Smirnoff 
test. In the application of the minimum distance method one always minimizes 
the distance between two distribution functions, or between a distribution 
function and a class of distribution functions, or between two classes of dis- 
tribution functions. 
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Note added in proof. The author has recently succeeded in applying the mini- 
mum distance method, in a manner different from that of the present paper, to 
a considerably larger class of problems. Linearity or other such restrictions are 
not needed, application is fairly routine, the proofs are much simpler, and the 
result of [4] is not used. Identified distribution functions can also be estimated. 
A brief description of these results will appear approximately concurrently with 
the present paper in the Proceedings of the Nationa! Academy of Sciences. 





ASYMPTOTIC DISTRIBUTION OF SERIAL STATISTICS AND 
APPLICATIONS TO PROBLEMS OF NONPARAMETRIC 
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Summary. The asymptotic distribution of a class of statistics, which has 
been called serial statistics, has been obtained, for permutations of the observed 
sample values. Specific instances of the use of such statistics, for the test of 
randomness of a sequence, have been given and the large sample power functions 
have been considered, when the alternative is a Markov process. 

1. Introduction. In testing for the randomness of a linearly ordered set of 
observations x, --- x, (such as time series), a plausible alternative hypothesis 
is frequently the existence of either cyclical or other periodic fluctuations, with 
varying amplitudes, including time series of the moving average or the auto- 
regressive type as investigated by Yule [16] and Kendall [5]. The whole class of 
such alternatives may be characterised by the absence of a strong monotonic 
trend and predominance of periodicity in the general sense. 

Yule [16] and Kendall [5] have considered the general autoregressive model, 


(1.1) Xi = f(a +++ Ti-k) + €; 


where ¢,;'s are independent random variables and 2, --- z, is the observed 


series. Especially the linear autoregressive model has been successfully applied 
to various situations by Yule [16], Kendall [5], Walker [15], and others. The 
theoretical model in such cases is determined by a law of succession, involving 


at most k successive observations. Thus the relation between neighbouring 
observations is more important for the test of the hypothesis of randomness 
against such alternatives. 

Where the model does not specify the distributions of the random elements, 
the test of significance must be nonparametric. The nonparametric serial 
correlation test based upon the permutations of the observations 2; --- 2, , 
suggested by Wald and Wolfowitz [13] as a test for randomness, seems to be 
suitable in such cases. In the case where a strong and persistent trend exists, 
for example in growth processes, the relations of any observation with all other 
observations in the series are obviously important. A test of randomness sug- 
gested by H. B. Mann [9] seems to be better suited in such cases. An investi- 
_ gation carried out by G. E. Noether [11] strengthens this conclusion, although 
it is difficult to decide between the two types of tests when neither the monotonic 
trend nor the periodic element is predominant. 

A general class of nonsymmetric statistics S(x, --- x,) of the serial correla- 
tion type will be considered in this paper. These depend upon the relation between 
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neighbouring observations in the ordered sequence and will be called serial 
statistics. Serial correlation, number of runs up and down (Wolfowitz [12}), 
etc., are instances of serial statistics. In fact most of the existing nonparametric 
tests of randomness are either of the serial statistic type or Hoeffding’s [4] 
U’-statistic type 

A serial statistic S(2, --- z,) is defined as 


1 n 
(1.2) Sry coe J) = > filx, °° Deane i) 
nN i=l 


where fi(z; --+ 24.) are functions of variables 2; , %i41, °** , 2i4e—1 Only. In 
the nonparametric method, the conditional distribution of a nonsymmetric 
statistic for fixed sample values, when only permutations of the sample values 
are considered, is used. Such a distribution we have called a nonparametric 
distribution. 

The nonparametric distribution of a nonsymmetric statistic, such as 


S(x, +++ 2), depends upon the unordered sample values {x;} and is thus a 
random distribution function. It has been shown that the nonparametric 
distribution of S(a, --- 2,) converges stochastically to a normal distribution 
when the absolute moments of the functions f;(x; --- 2,) are uniformly bounded 
for all values of ¢ and a certain relation (B,) holds between the product-moments, 
for large samples. 

This result has been generalised to the case of p serial statistics S;(x, ---+ 2) ++: 
S,(a «++ 2). It is shown that their joint distribution converges stochastically 
to the multivariate normal distribution. Under the hypothesis H, , for which 
the variables 2, --- 2, form a Markov process of order p, a stochastically 
asymptotic expression for the power function has been obtained, on the basis 
of which a nonparametric test for randomness may be chosen, which for large 
samples discriminates against this type of alternatives. Thus we get a non- 
parametric test of randomness which has asymptotically optimum properties 
for alternatives H,. As pointed out by Wolfowitz [2], no test of randomness 
discriminates against all alternatives and thus any such test has to be designed 
to discriminate strongly against a class of alternatives. 

In the case of a stationary Gaussian process, in which the variables are 
circularly ordered, the method of this paper gives a lower limit for the power of 
the uniformly most powerful test. The existence of an uniformly most powerful 
test has been proved by Lehmann and Stein [7] for the nonparametric case and 
by Lehmann [6] and T. W. Anderson [1] for the parametric case. 


Notations and terminology. Throughout this paper we consider stochastic 
convergence of random variables and random distribution functions. These 
concepts are explained below. 

DEFINITION. Two sequences of random variables, {z,} and {y,}, will be 
called asymptotically stochastically equal, denoted by z, =,¥y,, When 
Pr {| 2, — yn | > €} ~O0asn— @& for any given « > 0. 

If in particular y, = y, independent of n, then we have the usual notion of 
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convergence in probability and we write Plim z, = y, a notation due to Wald 
and Mann. This also includes the case when y is a constant. 

We also consider stochastic order relations. 

DeFINITION. For a sequence of rendom variables {z,}: 

(i) t, = O,(n*) implies Pr {n™* | 2, | > A} < ¢ holds for A > A, and 
n > n,, for given e, 

(ii) z, = 0,(n“) implies Pr {n 
given ¢ and 6. 

(iii) For a set of random variables x; --- z, we write {z,; --- 2m} = 0,(n*), 
when 


@i 


| an | > 6} < e€ holds for n > n(e, 4), for 


Pr {n“|a,| <6-+---n*|amn| <6} >1—e forn > n(e, 4) 


where m may depend upon n, say m = ¢(n). 

We now introduce the concept of stochastic asymptote, which is very useful 
for the purpose of this paper. 

DEFINITION. Two sequences of random variables, {z,} and {y,}, will be 
called stochastically asymptotic, denoted by z, ~,y, if Plim z,/y, = 1. Since 
Plim z,/y, = 1 also implies Plim y,/z, = 1, the stochastic asymptotic relation- 
ship is symmetric. 

The following results will be found to be useful. 

Lemma C. 


(i) tn =pY¥n and  1/y, = O,(1) implies x, ~pYn 


(ii) 2n pin and on Wn implies tn/on pYn/Wn 


(1.3) 
(iii) t, ~pyn implies x,~ y', for any positive r 
(iv) In =pY¥n and on = Wn implies Xn + Gn =pYn + Yn - 
Proor. (ii), (iii) and (iv) are obvious. From 1/y, = O,(1), 
Pr {1/y, > A} <4 forn > nN. 
Let now 2, = Yn + €n ; then from 2, = pyr, 
Pr {e, > «/A} <9 forn > nm > %. 
Thus 
Pr {| tn/yYn — 1| < €} > 1 — 2p, when n > n. 
Hence (i) follows. 
We consider now a sequence of random distribution functions. 
Derinir1on. A sequence of random distribution functions [{F,(2, a,)}, 
where the a;’s are random variables defined in a probability space, will be 


considered to converge stochastically to a distribution function F(x), if, for 
n > no(e, 4), 


Pr {| F(z, an) — F(x) | > «} <6 
at all points of continuity of F(a). We denote this as Plim F(x) = F(z). 
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A relevant theorem for the stochastic convergence of a sequence of distribution 
functions, from the stochastic convergence of the set of all moments, has been 
proved by Ghosh [3] and will be used. An extension of this result to p-dimensional 
Euclidean spaces is possible by the same method and we shall make use of this, 
without giving a formal proof. For a test of significance based on nonparametric 
distribution, the power function is itself a random function. The usual notions 
of consistency, asymptotically most powerful test, etc., have to be defined in 
the sense of stochastic convergence. We shall consider these notions for the case 
of nonparametric distributions. 

Let {C,} be a sequence of critical regions, corresponding to a test 7 of the 
hypothesis H» in the universe of permutations of semple values, I',(2; --- 2p), 
and let Py{C, | 2 +--+ 2,} denote the conditional probability of C, in 
I’,(a, +--+ 2) under the hypothesis H. 

DeriniTion. The test 7 will be called stochastically consistent against an 
alternative H, when both 


Plim Py,{C, | 41 +++ tn} = a(constant) Plim Py,{C, | %1 +++ ta} = 1. 


DerinitTion. If for a class of alternative hypothesis H,(w lying in a space 2) 
there exists a function F(C, , w), depending only on permutations, but not on 
actual values of 2; --- z, , such that 


Plim Py,{Cn | 21 +++ ta}/F(Cn,w) = 1 wc Q, 


then F(C,, , w) will be calied a stochastically asymptotic power function of the 
test, for w C 2, and denoted by 


Pu {Cn | a1 +++ tn} pp F(Ca, w). 


Obviously, for a class of tests 7,, T:, T;, --+ which possess stochastically 
asymptotic power functions F,;, F2, F;, --- the notion of most powerful, 
uniformly most powerful, etc., tests corresponds stochastically to similar notions 


for the functions F;, F2, F;, --- which are independent of the unordered 
sample. 


2. Distribution of serial statistic. Let z, --- z, be a sequence of independent 
random variables with absolutely continuous distribution function F(z) and 
{flay --+ 2,)} be a sequence of functions of x; --- 2, such that 


(A) [oo [fer ed | ar) + dP) < C 


holds uniformly for 4 = 1, 2,3, --- , and for s = 1, 2,3, --- We shall consider 
the distribution of 


. i< 
2.1) S(a, +--+ a.) = - D> filer coe Dey ) 
T tmi 


where we put z,,; = 2, forj > 0. 
Now S(x,, --: 2») is a nonsymmetric statistic of x7; --- z,, which assumes 
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the values S; --- Sy for N = n! permutations of 2, --- z,. For given values 
of z; --: £,, we shall consider the repartition function (Von Mises [8]) of the 
variable S(z, --- z,) in the universe of permutations T(z; --- z,). On the 
hypothesis H, that the variables z, --- x, are independently distributed with 
the absolutely continuous distribution function F(x), the conditional probability 
of each point of T',(a; «++ 2,) exists and is equal to 1/N. The repartition of 
the variable S(z, --- z,) in T,(a --+ 2,) thus gives the conditional distribution 
of S(z, --: 2.) in T',(a, «++ 2.) under the hypothesis H,. The conditional 
distribution of a nonsymmetric statistic S(x; --- z,) in x(a --- 2,) will also 
be called its nonparametric distribution, and the distribution function of 
S(a, «++ n) in Ty(a, +++ 2) will be denoted by G,(S, x; --- z,). The expecta- 
tion of the nonsymmetric statistic ¢(2, --- z,) for the conditional distribution 
in T,,(a --+ x,) under the hypothesis Ho is given by 
E'\o(a, +++ aa)} = etn," 3 
(2.2) : 

- | Glrs +++ a9) dGa(S,2x1 «+> 25) 

Py (2y°+*2n) 

where >>, denotes summation for permutations of 2 ---2,. Thus 
E'\o(a. «++ 2,)} is a symmetric function of x; --- x,. In particular when 
(2, --+ 2) is a function of k variables only, we shall write 


(2.3) E'\o(a, +++ a} = > 9(2i, ‘++ Diy) 


n(n —1):--nm—k+1)° 


Here the symbol >>, stands for summation for all sets of different suffixes 
(i; «++ %&) from 1 to n. When there is no ambiguity we shall merely write 
>, (a1 «++ a) instead of 5°, O(a, --- x4). 

We shall denote by F{¢(2; --- x,)} the expectation of the function @(x; «+--+ x,), 
for the distribution of ¢(a, --- x,) in the sample space R,, . Let 


(2.4) M, = E'{S(x +++ a)} = 1 > Deefi(xs +++ Dp) 


nN t=1 nit) 


where n“! denotes n(n — 1) --- (n — k + 1). Then M;, is the conditional ex- 
pectation of S(x, «+ z,) inT’,(a, +++ 2). Let also 


(2.5) M, = E’{[S(a, +--+ t,) — My} 


be the rth moment of S(a, --- z,), for the conditional distribution in T,. We 
shall show that when condition (A) and a further condition (B;), to be stated 
later, hold 


VM. 0 when r is odd, 


2.6) Plim wie 


|\r1/2"'*(r/2)! when 7 is even, 
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so that the nonparametric distribution of 
(2.7) {S(a: -++ am) — Mij/VM; 


converges stochastically to the normal distribution with mean zero and variance 
unity, from a theorem of Ghosh [3]. Let 


(2.8) n+ s+ a) ae ei +4) meal 


t=1 


where 


gi(ae °° 246-1) © Sela ++ Dese-y) — My 


so that E’{ 7 (a, --- 2,)} = 0. We have 


‘ 
| 


(2.9) M, = E'{(T(a --+ 2,))'"} = ~ e'|d Ge(tr = +> Legs »| a 


t=1 
In the expansion of (2 gi(%e +++ Le4e1)]’ we get products 


Il [ge(x. en? Li4e—1)]"". 


Since }> p, = r < n for sufficiently large n at most r out of p; --- p» are dif- 
ferent from zero. The g-factors g:(x, -+- 2:4%-1) with nonzero indices may be 
divided into subsets, such that in each subset, a g-factor g:(a, «+ 2:4.-1) has 
at least one common z-suffix with another g-factor of the same subset but no 
common x-suffix with g-factors belonging to a different subset. We shall call 
these subsets P-sets. 

Generally, if] [f1 [g¢,(xi, --- 2s)]’* be any product of g-factors with nonzero 
indices, we can divide them into subsets with the above property and these 
again we call P-sets. The grouping of g-factors into P-sets is obviously invariant 
for permutations of x; --- z, . The essential character of a P-set depends upon 
the relations between nonzero values a --- a, of the p’s and the x-suffixes of 
the g-factors, which are invariant for permutations of (2, «++ 2,). 

Two P-sets will be considered to have the same structure if one can be derived 
from the other by a permutation of 2, --- 2, , the suffixes q --- q of 
Jq;(@j, *** L»,) being ignored. Thus the structure of a P-set is invariant for 
permutations of x, --- zx, . The number of different g-factors in a P-set will be 
called its length. A P-set of length one will be called a linear P-set. For given 
indices a --- a, the number of P-sets with different structures is obviously 
bounded (independent of n), since P-sets of different structures may be obtained 
from the product, 


[gia tet re) “"[go(re a Xx)|" ” * faire mpheg °° °° rn)|*' 
by proper identification of the z-su‘fixes, that is, by replacing groups of variables 
Zi, °°* @, by x, ete. 


We shall consider two special types of P-sets. A P-set is of type I when, by 
a suitable permutation of (2, --- z,), it can be expressed as 
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! i 
a: aca a at la: — Geil <k 
II Dar (a Tarth—1)] ” Cig i> ey, 


All P-sets in the expansion of M, (2.9) are of type I. 


A P-set is of type II when, by a suitable permutation of (a, --- x,), it can 
be expressed as 


Ja, (21 - Lk)Gaq (Xo, v0* Bey) 


where o; = i(i, 7 S k) and the other suffixes o; --- o, are different from 1 to k. 
A P-set of type II is also a P-set of type I when j = 1 andi = k and | q — q@ | 
< k. 

We may now write 


i (Ja, (Xe, te rasa)" 


J 
(tt ) 


(2.10) M, = : Cla, ) 3, EB’ 
v” q 


where l, + --- + 1, = e. The summation >>, is taken for all P-sets of type 


qd 

I and given indices (a «++ a@y,) ++ (@ij4---44mop¢ °** @y+---+1,) With different 
structures, corresponding to the terms in the expansion of [>> g:(a: --- t+4x-1)]". 
The summation 7 is taken for all sets of values of a --- az,4.--41,, Such 
that a, + --- + a@,4...., = 7, and for different lengths of P-sets, with ap- 
propriate coefficients C,(a, 1) = Cp{a --+ an) +++ (ty getg ogg 15% Wy tende) fy 
which are equal to the number of ways of grouping r factors in (2.9) in sets 
of (a; «~~ a,), ete. 

Before proving the main theorem we shall establish a number of lemmas. 

Lemma I. Let 2; --- zp be real numbers. Then 


f IN A 
a) + cs +] sp | 
2.11 at ++. ge ares ee 
( ) | #1 I} 1+ i, + --- +] 2, [" 
where’ = 4, + -:: +A, andi; > 0. 
Proor. Let | zy | be the largest of the numbers | z; | --- | z, |, then by re- 
placing them by | zm» | the product | zi! | «++ | 2» |, only increases. Thus 


' 
Ai a “a d > 3 x 
| 24 2°? £,° | = | 2m | Ss Drs | Zils 


(lew[’ sl | zu | 


a a 
| 21 coo SFIS 5 ‘ 
| te Pts Sap |u| 


Hence 


-2?| si +E yep". 


i=1 


Lemma II, Let 


= EB’ {(filar jie ae)}}. 
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Then 


|B’ {I (Go, (2j,1 Sos nal" | 


> Mit) + ds 2» Mic) Mtg t +++ + dea: x Mista, +1 r odd, 
J y= J= 


Ss } 
=) a ay 
{drsa > M la; Ho*t* HH dongs .% M,.9; r even, 
j=l j=l 


where the x;; belong to x, --- x» (the double suffix being used for convenience only), 
and d; --+ d242 are independent of n and x; «++ Xn. 
Proor. From Lemma I, for a; + --- + a, = 1, 


- 2 (9e; Ti °°* Tad)" r even, 

. i= 

| IT (9a; (xi,1 ++ xy a))"! Ss } . 
I s 

| l + 2d CPC aR "7 ryx)) ” r odd. 

\ i> 


Taking expectations E’ for both sides, the inequality holds for expected values 
and 


(ese 
s \ | Z E'(ga;(xj. ++ aya) r even, 
a; | jul 
|’ iI loaj(ia--- teal) 1, 
| (1+ i E'\gq;(aj. °°: aj))"** r odd. 
\ j=l 


The result now follows immediately from 


E'|gq;(aj.a eas 25.x))" = 2. (—1)" "C, Mie; ae 


ete. 

Lemma III. Let U = E’\o(a; --- ax)} be a U-statistic of Hoeffding [4|. Then 
we have Var (U) S 4k(3k — 1) Var [¢]/n where x --+ x, are independent random 
variables with the same distribution. 

PROOF. 


Var (U) = E{{>-, (ai, --- ty — o)/n™?} 
= (n™)* SLE {lo(a: 2 rae, a $)[o(x; oe Lik) a $)} 
where » represents summation for all different sets of i; --- i, and j, -+~ je, 
and @ = El(x --- x,)]. The only nonzero terms in this expression are those 


for which at least one of the j-suffixes is equal to a i-suffix. The number of such 
terms is not greater than 


(n™)? — n™(n — k)™ 


f —k)--- n— 2k +1)\ 
< [k]\2) _ Mm wh. Make te Te 
ayy nin —1)---(n—k + 1)J 
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2k — 1\) 
aE cabs ceo , 
eene7 

t (k + 1) +--+ + (2k — 1) »k(3k — 1) 


n ] 2 


= (n'*!) 
* 


. e . 4 
E\\o(ai, --: ty) — Ollb(ay, +--+ ry — O)} S Var [ola --- xe 
for all sets 7, «++ q& and j; --- jp . Hence the result. 
Lemma IV. For any given 6 > 0, 


, 


(2.12) [Mir, +++ ,Min, Mar, -:-, Man, °**, Men} = 0,(n’). 


Proor. From Lemma III, 
Var [M,..| S 4k(3k — 1)n™ Var {[fd(ai «~~ xe)]"} S $k — 1)C2,/n. 
Hence 


Pr {| Mi... — E(M.t) | > n°?) s 4k(3k — 1)-C2,/n'™. 


Again from assumption (A), E(M,,,) | s C,, so that 
Pr {| Mii) <2n*? 8s =1---rst=1--- n} > 1 — $k(3k — 1)[C. + --- Co} /n’ 


for sufficiently large n. Hence the result. 

In all that follows, whenever we consider a sum > emt in which a suffix i + p 
> n occurs, we shall take its value to be i -+ p — n. The same interpretation 
will hold for pairs i and j for which | i — j | < k, which will be considered to 
hold for values of 7 and j so that 7 — j(mod n) < k. We now obtain an asymptotic 
expression for M, . 

LemMMA V. 


(2.13) nM, =, [Ay = Ao —_ As] ‘n 
where 


* E'\ g(a; °° * Lis ) g(a; ae 


i~j|<k 


DE’ (giles «++ ae) giten * °° 
‘ k 


)\« 


| n k , : 
lAg = - 7 > E'\gi(ar +++ ay) gj(te, * °° 


\ NT ijml apel 


Za.8 = = E'\gi(ay +++ 24) giao, ++ * Ley)} 


i,j=l 


where og = ala, B S k) that is the Bth suffix og is equal to a, where of course both 
B and a are less than or equal to k, and other suffixes do not belong to1,2, --- , k. 
The summation is taken for all such values of a and B. 
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PROOF. 


L ENCogie. 


n 


Age 


NM” \i-jick 


> - {gil 


Xx) nibs + 


n 


pa E’ 
i—j k 
n > plgi(ar oan 
>. 


n! 


ys Dri gila 


I\<k 


+ ps gi(t °°: 
jt—J| <k 
+ Ped len) 


these 


‘ ' 
; Lisn—v) } 


E' \qilai 


- 2,) 


te) X he 2» 9j(Lega °° 
ja 


Here >>” represents summation for sets of values of o; - - - 
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2 


* Li¢n—) 9j\rj 


* Le) Gi Veg °° 


Ie ) } 


Te) Gj(Aeqa ** 
2k) 


Xa) } 


De) Oi\Tega * °° 


ni?) 


~~ 


Le Gi (Xe; 


X,) Oj (Pegs mee 


Ye \ 
2x) | 


zm). 


ao, , at 


x-suffixes being common with an x-suffix of the first g-factor g,(a --- 


The last term in the expression (2.15) is zero. In the summation 


Lis J 

P < p “ee. 

ni2hl \ g(r 

when two or more of o; «++ o,% are 
, . 2k—2 

of these terms is of order O(n 

summation }>;;, the number of 


x) — 95(Xe, 


common with the suffixes 1 to /, the number 
) and, considering the 
such terms is O(n™) 
and IV, all these terms are simultaneously of the order 0,(n’). 


° 2) 


} 


values of 7 
Also from 


least one of 


Xx). 


and j in the 
Lemmas I] 


Thus the sum of 


(2k }+1 


é . . . . rr 

), while the denominator in nM, is n rhus 
we consider, in the above summation, only those terms for which exactly one of 
- a is equal to one of the suffixes 1, 2, --- , k; hence the result. 


Lemma VI. 


. 2k+ 
these terms is of order o,(n 


a1 


ae 


MN \i-—j|\<k 


; p= E\filay ++ 


n Jick 


ee > Et filay -+ ay) file, 


“ ij=ml ase 


> Elfiar «+ 


1 t= 


nMo E\ fila, Ui +s ») fila; sick 


ry Mi (x, +1 
\2 


x,)|> 
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where a, «++ 0, i8 a fixed set of numbers different from 1 to k, except og = a(a,B S 
as before. 
PROOF. 


l nn : 
n a E' (gslay +++ Xx) Ji+p(Cp4. ‘t+ Dpen) j 
t=1 


( n ) 
)Il : 
= KE i= » Sila ++ Tn) feep(tpys * °° Tp+t) p~ 
\ t=] 


Here U;, is a U-statistic and 


vari DS files +++ Oe) ferp(Upsr *** Lyre) \ 
(N tml / 
s , L E\ fay oo Da) flay coe Dy) Serp(Lpur Pee Y p+k) Serp(Zpt1 — Lp+k)}. 
From assumption (A), by the application of Schwaraz’s inequality, 
Et f(a, ‘°° Le) Silay oa Za) Sone Fost °° Lp+k) Se+p(Lp4i PS 2 Lp+k)} 


is bounded and the bound depends on the constants C; only. Thus from Lemma 


III, 


(ML tum 


wil e,, 
EY {- Do fxs o°¢ Dp) Ser p(tp+1 = Lp+k) > 


Ic, ‘ ‘ 
~~ . 2d E{ fda a te) Serp(Lpvr ne Lp+k)} 
t= 


My =,{2 > El fdas --- n))) = ui. 
Hence 
= Oo, Ei todas -++ wevnadonles ++ Bise)) 
eS ela. os «teal Shes +++ tuna th ~ ipl. 


1 | t-j|<k 


l » . 
a E {gi(ay ene Xp) gi(Xes1 os rm) } 


TM | i-—jl<k 


l s 
=o- DL Ef fiar-++ te) filress +> t)} — (2k — 1)yi. 


NL \i-j\<k 


Consider the U-statistic 


( = 
BS yo filar- ++ Xe) filtey +++ Loy) ?- 


* éjuel 
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Since Var {n~ > far -+* Xe) fj(%e, *** Le,)} 18 bounded from assumption 
(A), we have from Lemma III 


nm ? ni? ij=l 


ab a. , Sy ee eee (2 > El fis + noi}. 
Hence the result. 

The condition (B;) may now be stated as 
(B,) lim inf pe, > 0. 
This implies 1/nM, = O,(1) and hence from Lemma C (1.3) we have 
(2.16) nM, —™,(1/n)[A, — Ag — Ag]. 


Reduction of linear P-sets. In the expansion of M,, from (2.10), all the terms 
consist in P-sets of type I. For given values of the indices a --- a, 4...41,,, We 
may group together terms with different suffixes for the g-factors such that 
there are m P-sets of type I and lengths J, --- l,, and with given structures. 
Such a sum is represented by 


2.17) D1 BATT lgej(xey «++ Lagse—v)*4>. 
q 125 / 


Let q: be the q-suffix of a linear P-set in (2.17). For fixed values m, --- 7, of 
G2 *** Qe, we consider the summation }>,, for q:, satisfying the above con- 
q 


ditions that is | q: — a; | > k (i = 2,3, --- 8): 


ZA LZ I (9a;(Xq; on Lay+n—1))"! 


%1 


2.18) 


- Ze II (gx (rx, _— Tr j+e—1))"! x > >. Ja (Xi ves Li) 


where — denotes summation for all (7; --- %), none of the suffixes belonging 
tom +j(i =2---8;j = 0,1,---,k — 1). Thus 


D1 _ Go,(ti, °° i,) = > Loe Geils ‘++ ap) 


q1 a=! 


- > SL” Gaile, 7° By) = dis Ris Gas(i, *** Lig) 


q=l 


(2.19) 


Here >.” denotes summation for z-suffixes 0; --- a at least one of which is 


common with the suffixes x; + j, and }-» is the summation for all values of 
a1 


q: such that | q. — 2: | < k for some i (¢ = 2 --- 8) that is, q is tied to one of 
the q-suffixes mw. --- 2, . Since 


> Ee GJos(21 — Tr) = 0, 


qi=l 
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at 7? Yq, (Xi, *** Xi,) 18 given by the last two sums of (2.19). By applying the 
process of reduction to 23 7. [[intlge; (te, -** 2q-4n-1)]*’ we get summations 
a 


of two kinds. In the first kind of summation, e — 1 g-factors form P-sets of 
type I as in (2.17) with the same structures of P-sets and a g-factor gq, (Xs, °-* Xe), 
where o, --- o, have at least one element common with the x-suffixes of e — | 
other g-factors, and q, assumes all values from 1 to n. In this case the number 
of P-sets is reduced by one. In the second kind of summation, e — 1 g-factors 
form m — | P-sets of type I as in (2.17), with the same structures and a g-factor 
Yq, (%i, *** Xi,) Such that q is tied to a q-suffix of the m — 1 P-sets of type I. 
These two kinds of summation correspond to summations }>*_, 5°” and 


>: >, of (2.19). 


BT 
We may apply the reduction process again and by such successive reductions 
we get the following type of summation. Let 


(2.20) M(p, t, v, 8B, y) = 2a 21 I] PAC eee 789) ee 
qd j= 

The factors in (2.20) belong to three sets A, B and C and have p different 
x-suffixes altogether. The e-8-y g-factors of A form t P-sets of type I (ignoring 
the other g-factors of B and C). The y g-factors of C are tied to the g-factors 
of A, such that y,; g-factors are tied to g,, , y2 to gq, , ete., with ¥:. Yi = vy, but 
have no common z-suffixes. The 8 g-factors of B have common z-suffixes with 
other g-factors forming P-sets which always contain a g-factor of A or C. The 
g-factors of A, B and C are so related in the M-sets M(p, t, v, 8, y) that there 
are v linear P-sets of A, unconnected with g-factors of B and with no g-factors 


of C tied to them, which we call free linear P-sets. The summation >>, is 
q 
taken for q-suffixes, so that the g-suffixes of A may assume any values between 


1 and n, subject to the restriction that these correspond to ¢ P-sets of type I, 
with given structures and given values of a;’s. The q-suffixes of the g-factors of 
C’ assume all possible values consistent with their relation to the g-factors of A. 
The q-suffixes of the g-factors of B may assume any value be between 1 and n. 

The M-set M(p, t, v, 8B, y) depends upon the structures of P-sets, which is the 
same for all terms of M, and on the way in which the q-suffixes of C are tied to 
the g-factors of A, which together determine the structural relations of an 
M-set. The structural relations of an M-set may be of any kind, except that the 
qg-factors of A form ¢ P-sets of type I and there are v free linear P-sets. Since 
the number of z-variables in any term of the M-set is at most p + k(8 + 7), 
the number of M-sets with different structural relations (including tied q-suf- 
fixes) is bounded and independent of n. The process of reduction, outlined 
above, may be applied to any M-set M(p, t, v, 8, y), where we reduce a free 
linear P-set of A, given by the suffix q, (say). If the sum >>, >, go,(a,, 

@1 


Xq,++-1) is expressed as in (2.19), M(p, t, v, 8, y) can be expressed linearly in a 
finite number of M-sets of the type 
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(2.21) M(p—w,,t—1,%,8+ 1,7), M(p — w, ,t — 1,v2,8,¥ + 1), 


where the suffixes o; --- o, have w, suffixes common with other g-factors of A 
or C. Also 
(2.22) Vv} v—w,—1 and »m2v—w, — 2 


hold in this case. The minimum of v; corresponds to the case when all w, x-suffixes 
are from different free linear P-sets. The minimum of v, corresponds to the case 
when, moreover, the reduced P-set becomes a g-factor of C and is tied to another 
linear P-set. We shall call a pair of g-factors, where one is tied to the other, a 
()-set, when they have no common z-suffix. 
We have already defined in (2.17) and (2.20) the summation > , Which is 
q 


a q-sum retaining the structural relations of the g-factors. We now consider 


another kind of summation >>, , where the summation is taken for all possible 
q 

structural relations consistent with /, v and for given sets of indices (a; -- + a@,), 

etc., of the P-sets of A. We shall now prove the following lemma. 


Lemma VII. When v is the number of linear P-sets and m the number of P-sets 
of type I, 


P ( m w+ 2)4 r/9 
(2.23) } B') I] [Goj(%q; °° > Lej+e-v]"! = 0,(n en eree 


n’ 2 @ jal 
where the summation + has the same sense as in (2.10), and ay, +++ a@ are 


q 
constants such that > a; = rand {(v + 1)/2| is the integral part of (v + 1)/2. 
Proor. We may write 


e 


sii «,| _ M(po, m, v, 0, 0) 
> BE’ iT (90 (2; cee Tq j+k-1)] is = = 
q 1 


| jas nl) 


where py is the number of different «-suffixes. By successive application of the 
reduction process to free linear P-sets we may express M(p) , m, v, 0, 0) linearly 


as 
(2.24) M (po, m, v, 0, 0) 


_ > (—1)'M (po — wi — cs * Oe, — j,v;, B, y) (7 = B+ 7) 


where 8 of the reduced P-sets belong to B and y to C, and w --- w, represent 
the number of z-suffixes reduced at different stages. Each term of 
M (po — wi — +++ — wj),m — Jj, v;, B, y) corresponds to different combinations 
of the pp — wi — ++: — w; 2-suffixes from 1 to n and to variations of the q-suffixes 
with proper restrictions. Thus from Lemma IV, for fixed values of the q-suffixes, 
the sum of terms in the M-set is of the order o,(n’’“'~”~****), Again the 
number of such sums in the M-set is of the order O(n” ”), since the number of 
ways in which the q-suffixes of A can be chosen is O(n” °~”), while the number 
of ways in which the q-suffixes of B can be chosen independently of the choice 
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of q-suffixes of A is of the order O(n’), and the q-suffixes of C are tied to those 
of A. 


Thus M (pp — a: — «+: — wj,m — j,v;,8,7) is of the order 0,(n’* “!~*! ++"), 
If at the ith stage of reduction, a linear P-set is reduced to a g-factor of B, then 
the corresponding w; > 0 and thus 


(2.25) wi + w+ ++ +; 2 B. 
Also from (2.22) 
(2.26) v; = 9—-@W~°**=— o- j — ¥. 


The process of reduction may be continued until v; = 0. Then from (2.26) 
mbtoet-: totjity2r, Aja2v, j2 (+ 1)/2), 
mtot--- to ty2bB+7 2 [(v + 1)/2]. 


Hence M(p) — w1 — --- — wj,m — j, vj, B, y) is of the order 0,(n?°*™ (°*?/1 +") 
and 


‘ l 7 ( - a | m— |(o+ 2) +$—r/ 2\ 
(2.27) oth >, E’ II (9e)(te; °** Zej+e-1)] » = 0,(n™ "rr /2+8—1/2) 
qd j= 


Since the number of P-sets with different structures is bounded, (2.23) follows 
from (2.27). 

It follows from above that we need consider only such M-sets for which w; 
assumes only the values one and zero, since in all other cases m — [(v + 1)/2] < 
r/2 and the sum of all these terms converges stochastically to zero, from what 
has just been proved. These correspond to cases where a linear P-set is reduced 
to a g-factor of B, which is connected with a free linear P-set of A by common 
x-suffix forming a P-set of type II, or it is reduced to a g-factor of C, when it is 
tied to a free linear set of A, forming a Q-set. We shall now prove the following 
theorem. 

THeoreM I. Let 2, --- x, be a sequence of independent random variables with 
the absolutely continuous distribution function F(x), and let {f(a, --- x%)} bea 
sequence of functions of x, --- x, such that the conditions (A) and (B,) hold. Then 
the nonparametric distribution of the nonsymmetric statistic 


E S the, Pe Tt+h—1) _ Ma |/- Vit 
TU tml 


converges stochastically to the normal distribution with mean zero and variance 
unity, 


Proor. Frota Lemma VII, in the expression for M, (2.10) the term 


| wit i 
nr!2 dL. E I [9o,(Xe, so Tq;+e—1)] i\ > a;j=r 
q ~ / 


is of order 0,(n™ (@t? I-72 +4 
” P 


linear. Now 


) when the number of P-sets is m, of which v are 


2im—v)+vsr or m—v/2 8 7/2 
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where equality holds when and only when all nonlinear P-sets are of length 
two. Thus m — [(v + 1)/2] < r/2 when r is odd and in this case, from (B)), 


(2.29) Plim M,/M;” = 0. 


When r is even, we need consider only terms with 2u linear P-sets and r/2 — u 
P-sets of length two, u assuming all values from 0 to r/2. Let v = 2u. Then, 
considering highest order terms in M(p, m, 2u, 0, 0), we get two kinds of M-sets 
after the first stage of reduction, for example, M(p — 1, m — 1, 2(u — 1), 1,0), 
and M(p,m — 1, 2(u — 1), 0, 1), both of which have negative signs and contain 
2(u — 1) free (untied) linear P-sets. In M(p — 1, m — 1, 2(u — 1) 1, 0), there 
is one P-set of length two and type II, corresponding to each way of combination 
of a linear P-set given by q (say), with any other linear P-set. In 
M(p,m — 1, 2(u — 1), 0, 1) there is a Q-set corresponding to each way of tying 
up a linear P-set with another one. 

Proceeding in this manner, we get, after the uth stage, a sum of M-sets with 
the coefficient (—1)", for which there are 8 P-sets of length two and type II, 
one g-factor of such a P-set belonging to B and the other to A, and y tied pairs 
of linear P-sets forming Q-sets of length two. These 8 P-sets of type II and 
y Q-sets are derived in all possible manners of groupin. ‘he 2u linear P-sets. 
Thus we may write, considering highest order terms in n only, fora = e — 2u 
and b = e — 2u+ 8, 


l - e ‘ aad 
q 


\i= 


1 (7 

— oe be Aw Rie II (Go ;(%e, *** Las+k- ») F 
no) 8 q j=l 

(2.30) ° 

x IT (—1)ge,(reua eas Teh) )Goige62- ra jaa Vaigatk 1) 


t= 


7 ) 
x II (- 1) ge (ae ay Ty 5h) Goige\Teian =o Vain otk- »} 


where the q-suffixes g; belong to A and the £; belong to B and the ¢; to C. Since 
8 + y = 2u and pp is reduced to p) — 8, the summation Ts is taken for dif- 
qd 


ferent structural relations of the g-factors such that the z-suffixes £,(1) 
E(k), €:(1) «++ ¢(k) are all different for all 7,7 (1 S i S 68,1 Sj S y) and 
different from the z-suffixes of A except that for sets of type II, 


§;(x) = Je—2u+i + _ = ] 


holds for just one pairs of values of x and y between 1 and &, for each term in 
the summation. Also 8 and y have all possible values satisfying 8B + y = wu. 
We may thus write, considering highest order in terms in n only, 
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yar /2 LE QT] (gu ;(2q, ar Hq j+h 1) » , 


(jai 


(—1)* C.. ‘ 
2. )" 36 2. IT [Quj(Loy °° * Leppe—t)] 7\ 


nls }+r/ 2 


8 
x {> > i Zi gi(ra -** Dp) Gi\Xe, * °° ta) 


\ syed ij=l ) 
\7 
X < +o | iti g(x °° Zp) Gi( Xess ae Top) 
tj 
where o; -:~ a, have just one element common with 1, 2, --- k and where C,, 
is the number of ways in which 2u linear P-sets can be divided into two groups 
of 28 and 2y g-factors, forming P-sets of type II and Q-sets, respectively. Thus 


Qu)! (2B)! (2y)! 2u)! 
(2.32) C.p = eu)! _ (26)! Qy)! | Qu) 
(2B) !(2Qy)! 2°(B)! 2(y)!  BW+r(B)!(y)! 
We thus have 


] ly basil | (Qu)! 
OY 4 tat a t)> = cameeaaiaren 
(2.33) nr? 2. ‘ ] I] [dares Tagte—)| ? nt? 28+7(B) I(y) | 


‘e—2u 


7, E’ \ I] (Gaj(%a, ore, tq y+n—1))"4 x [-A,]? x [—A, al” 
qd \ jal 


where ga, *** Gq.—», denote the g-factors of nonlinear P-sets of type I. 

The nonlinear P-sets are of type I and are obtained from every manner of 
grouping r — 2u factors in [Sogila, coe Biay |’ into pairs. Thus from Lemma 
VII, considering highest order terms only, we have 


| vit ail 
nr!? es Kk’ } IT (9a ;(%a, ak Tqj+h—1)] 
q j= 


* (Qu = 
*n™'2(B! ee | Dee E I “Geile ** La5+h—1) Ga’ Xa", 4 re | 


j=l 


x [—Aal® & [—Ag]” where |g; —93)| <k 


(2u)! , \)1Cr/2)—u 
=o aD EL De odes +++ xesns) giles +++ ty401)}] 
x [—Ad]? & [—Ag]” 
| (Qu)! 
? n’!? (B!)(y!)2* 


2.34) 


[Ay ?—* (— Aa)’ [— Al’, 


neglecting terms of lower order in n which tend stochastically to zero. 

We now find the coefficient C,{(1), (1), --- , (1,1), --- , (1, 1)} with 2u 
linear P-sets. These 2u factors can be chosen from r factors in "Cs, ways, and 
the remaining r — 2u factors can be grouped into pairs corresponding to P-sets 
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of length two and type I in (r — 2u)!/[(r/2) — ujt2'"”” ways. The number 
of terms of the type (2.34) in M, is 


r! (r — 2u)! r! 


(2u)'(r — Qu)! [(r/2) — ul!2/?-« ~ (Qu) Y(r 2) — uf!2"/-«" 


Thus 


A (r/2)—u me on ie ’ 
gr i? ay Bly {(r 5 = igre | 1 | al | al 


r! 
(A 
2'(r/2)! 
| r! 


” nv! 2r(r/2)! 


1 — Ay — As)"” 


(Ay — As — A;)"” 


from assumption (B,). Hence from (2.16) and Lemma C (1.3), 
(2.36) Plim M,/M;* = 1 


Theorem I follows from (2.29), (2.36) and the theorem on stochastic convergence 
of distribution functions [3]. 


3. Joint distribution of two or more serial statistics. We shall now generalise 
Theorem I to two sets of functions {f{ (a «~~ ae)} and {f!? (a, «-+ ae)} satisfy- 
ing condition (A). Let 


\Fn = > Zic** Lian, FY? = . > f(x, -** Diana), 
i=l 


M,,. = E’ UFO — POY (FS — PP}. 


where E (P) = FY? and E'(F?) = FY. When condition (B,) is satisfied by 
the functions fe} and tf? B a (Bz), to be stated later, is satisfied, the 
limiting form of the joint nonparametric distribution of (PF) — FP) /+/ Mz» and 
(FD — FY)/VMo-, for permutations of 7, --- x,, is a bivariate normal 
as im means zero, variances unity, and correlation coefficient p 
(defined in B.). Let 


(3.2) G, = (1 n) dg: (z; «++ 2ige-y}) and H, = (1 n) dehy (a 


where 


(3.1) 


giltis* Dignan) = fP(ai +++ Ligne) — PY, 


+(2) 


hday +++ tiger) = SP? (ai +++ Lignan) — cw 


so that E’(G,) = O and E’(H,) = Oand M,, = E’'\G.H%| for r, s = 2,3, 4, 
In the expression for M,,, we get products of g-factors and h-factors. As in Section 
2, we define P-sets consisting of g-factors and h-factors connected by common 


x-suffixes. The P-sets consisting of g-factors only will be called P(g)-sets and 
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those containing both g-factors and h-factors will be called P(g, h)-sets or 
mixed P-sets. 

As in Section 2, we have P-sets of type I and II. A P-set is of type I when, 
by a permutation of (x, --- x,), it can be written as 


I (94; (Xo; 8° Las+k D)* [hg (rq; ae Zese-dl” * 
where | qj-a;4, | < & holds (either a; or a;, may be zero but not both). 
All P-sets in the expansion of M,,, are obviously of type I. 

A mixed P-set is of type II when, by a permutation of x; --- z, , it can be 
expressed as go,(%1 «-* X%)hq,(%e, -** Lo,), Where o; = i for i, 7 S k and the 
other suffixes a; --- o, are different from i to k. 

The relations of the g and h factors and the x-suffixes of a P-set which are 
invariant for permutations of x; --- x, will be called its structure. Two P-sets 
will be considered to have identical structure if one can be derived from the 
other by a permutation of z, --- x, , both the q-suffixes and the distinction 
between g-factors and h-factors being ignored. 

We may now write, 


| vit a 
(3.3) Mi = > 'C, (a, D dE (il (90; (te; *** ®ej4e—)]*! 
= 


a. 


€2 \ 
x Il (hej (aa; °° Sete)” * | , 


i=l } 


where , + --- + ln = e, and i + --- + I, = e. The summation >, is 
q,@° 
taken for all Po, h)-sets of type I with indices (a, --- a, , an ee a’) 


(typeset nyt °°" A pertin » Ot gtnng al °** Giiy-4ie), Whered)a; = 
r, Da; = , Th = ¢,, and > I; = e,. The summation )~’ is taken for all 
systems of a of a +++ a, and a «+ a, satisfying }>a; = rand >@, = 8 
and l,, -::,la,h, °°: , be, with appropriate coefficients 


, , 


Crie(a, 1) = Cyl (ar -°* ay, ar °° arr) o°* (ny..4a,_y41 °°? O11, 4. tm) ty 
corresponding to the number of ways of grouping r g-factors and s h-factors 
into P-sets with given indices. 

It is easy to see that the Lemmas I to IV may easily be generalised to the 
present case. 

Considering products of g-factors only or of h-factors only, stochastic 
asymptotic expressions for M2» and Mo.» are given by Lemma V. We now find 
the asymptotic expression for M,, . 

Lemma V* 


(3.4) nM, = o{ Ai.¢.h - Ao.g,h —_— As.o,n] ‘n 
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where 


(Aae.h = yo BE’ \ g(x, °c * Lite ) h;(x; °° * Dirk | 


i—j|<k 


= | Aso.n = ps EB’ \ gla nes re) hy(Xepa ‘= tm) | 
(3.5) 4 i—j|<k 


k 


| Leva — SS pr SGiltr +++ te) Aj(te, +++ Les) 
teaa = Be Senate FE wr {m 9) il, tal) 
\ sym ay=l i=l \ n 
where x, y in eis assume all values from 1 to k and a, = x (x, y S k) and other 
suffizes a; --- o, are different from | to k. 
PROOF. 


L ow 
Mia = n? E 4>2 gilt --° isn) [20 hj(a; --- tj4n—-1)|} 


] " : 
: n? » E gia ‘t+ Dipe—w hj(x; re Li+n—a } 


i~—j|<k 


$i TS pick!) WeSC RN 


n? ti—j| 2% 
5 E’\gilay +++ ty) Aj(aegn +++ Den) } 
i—j|>k 


Tx) Wyler ** > Te) | 


n {2h 


_  Dalote - 


i,j=l 


Del gi(as -+ > tx) hy(ugn +++ tau)} 


{t—j| <k nik) 


Since 
D dp Altes «+ tu) = 0, 


j=l 


the above expression can be written as 


I ” 
4) nl2*) p> Le» gi(as 7 Xx) 2» h(x, 77a Ze) 


+ a ws gilts +--+ te) hy(egs - °° Tex) | 
}*—-7| <b 
where b Ry denotes, as before, a sum for sets of values of o; --- o; at least one 
of which belongs to 1 to k. As in Lemma V, considering highest order terms in 
n, the result follows, since the sum of all other terms converges stochastically 
to zero. 
Lemma VI.* 


nM,, = p Bit 
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where 


%¢< Al (2 
E\f; (a; ++: Li4+n—1) f; (x; 7° * Liap—1) 4 
<k 


, (1) 5 (2 
D> Efe (ar +++ te) SP (wag > tae)} 
k 


i—ji< 


n k 
3 2: i Et? (an +++ oy) fi? (ae, o°* Be) + kus. Mol, 


| 
TW ijn zy=l 
where a, «++ o, have just one element common with 1 tok, o, = y (x, y S k), and 
l e 4 0. \) ls De, ’ 
ona E\fi By By) I, aa Etfi (x, +++ 2) }. 
t=1 b t=1 


The proof of this lemma is analogous to that of Lemma VI and will be omitted. 
We now have the condition 


(3.7) (Be) lim w11/V 2,0 x Me.2 = 9 


exists and is less than one in absolute value. 
RepuctTion oF P-sets. As in Section 2, we define the sum 


«1 
M (p, t, v,-8, y) = a. Ps I] (9o;(tj.a + yd)" 


qq’ j=l 


(3.8) 


x II (hoi (Xj npn °° a;,%%))"i. 


Here there are three sets of g and h factors: e; + ¢ — B — y g or h factors of 
A form t P (g, h)-sets of type I, ignoring g or h factors of B or C; y g or h factors 
of C are tied to g or h factors of A; and 8 g or A factors of B have common 
x-suffixes with other g or h factors forming P(g, h)-sets, which always contain a 
g or h factor of A or C and such that there are v free linear P-sets. The sum- 
mation is for g-suffixes so that the q-suffixes of the g or h factors of A may assume 
any value between | and n, subject to the restriction that these correspond to 
t P(g, h)-sets of type I with given structures and indices of P-sets, while the 
q-suffixes of the g or h factors of C are tied to the q-suffixes of A and the q-suffixes 
of B are free and may assume any value between | and n. 

The structural relation between these factors must be such that there are v 
linear P-sets among them which are not connected with the g or h factors of B 
by common 2-suffixes and have no g or A factors of C tied to them, and which 
we call free linear P-sets. 

We may now apply the reduction process described in Section 2 to free linear 
sets in a M-set. We get \/-sets of type 


(3.9) M(p—w,,t—1,7,8 + 1,7), M(p — w,,! — 1,02,8,7 + 0), 


, 


where w, 2 1, 1 2v—w, — 1, wo, & 0, andy, 2 v — w, — 2. Also it may be 
shown that, retaining terms of highest order in n, we need consider only such 
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M-sets for which either w, assumes the value one, corresponding to the case 
where the free linear P-set is reduced to a g or h factor of B and is connected 
with another untied (free) linear P-set forming a P-set of type II, or w, assumes 
the value zero when the free linear P-set is reduced to a g or h factor of C and 
is tied to a free linear P-set forming a Q-set. 

Now in the expression of M,., all the terms consist of P-sets of type I. For 
given values of the indices a --- a, and a; --- a.,, we group together all 
terms with different q-suffixes, so that there are m P(g, h)-sets of type I with 
any structures, but with indices 


, , 
(ay +++ ay, , @y *** ay) 


, 


° (00d, +--+ +1 S99 Mig peootde, » ly +---4l_ n+l °° * 


Such a sum is represented as 


fe 


10) Dee BT Ue,Gre ++ Zaye" LI theslaay «++ aeisedI%>- 


qa =i yl 
Proceeding exactly as in Section 2 and from (3.9), we can prove 
Lemma VII*. Let m be the total number of P-sets and v linear P-sets in the 
expression (3.10). Then the sum (3.10) is of the order 0,(n™ \°*?! *), 
We shall now prove the following theorem. 
Tuvorem IT. Let {f}? (a, --- ae)} and (f(a, «++ a)}, for i = 1,2,3, --- 


9 @9 ’ 


be two sequences of functions satisfying the conditions (A), (B,) and (B:). Then 
the joint nonparametric distribution of 


(FS — P?)//M., and (F? — P®)//M,.; 


converges stochastically to the bivariate normal distribulion with means zero, variances 
unity, and correlation coefficient p given by (3.7). 

Proor. In the expression for M,,, we get sums of the type (3.10) with coef- 
ficients 1/n’**. If in any such sum (3.10) there are m’ P-sets of which v’ are 
linear, then 


2(m’ — v’) +r Sr+s or m —v/2 s (r+ s)/2. 


where equality holds only when all the nonlinear P(g, h)-sets are of length two. 
Thus m’ — [(v’ + 1)/2] < (r + 8)/2 when r + 8 is odd. When r + 3 is even, 
we need consider sums (3.10) with 2u’ linear P-sets and (r + s)/2 — wu’ P(g, h) 
sets of length two. These nonlinear P(g, h)-sets are of type I and are obtained 
from every manner of grouping r + s — 2u’ factors in 


[Seas (ai ++ tinea) Thi (ai ++ tigen)’ 


into pairs. Also by the process of reduction described before, we get a P-set of 
type II for each way combining a g or h factor of B with another linear P-set, 
while we get a Q-set for each way of tying up a g or h factor of C with another 
linear P-set. 
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After complete reduction of free linear P-sets in an expression of the form 
(3.10), with ¢ P(g)-sets and Q(g)-sets of length two, ¢ P(h)-sets and Q(h)-sets of 
length two, and n mixed P-sets of length two, we get, exactly in the same manner 
as in Section 2, terms like 


(Aro) *[Aral**{Atg all — Aeiol 
[~ Ao al”? x ¥ Aso all = Aso) [— Asa}"*[ $6 Aso)", 


@ + we + ws; = (r + 8)/2 — w, 


(3.11) 


(3.12) 
B+ Be+hB= 8B, ntwnty =y7; u=B+y 
w+ htm = &, we + Bot ¥2 = €, 
(3.13) 
atatrw=1 2ra=, Sra=s. 


The Aig, Avg, Ase amd Ay», Aca, As» are given by (2.14) for functions of 
f\? and f{”, respectively, while Aj... , Ae. , and A;,,., are given by (3.5). 

A term like (3.11) occurs in the reduced form of M,,, for each way of pairing 
r + s factors in 


[> gilai, 78+ Li+k NTS hilzi, co * Lik a)]' 


in appropriate manner. Thus the numter of terms (3.11) in the reduced form of 
M,., is given by 


(3.14) K(r, 8, or, we, ws ; Bi, Be, Bs; 1, V2, Ya) - 


This is the number of ways of choosing 2¢ factors from r g-factors and 2¢ factors 
from s h-factors to form: w, P(g)-sets, w, P(h)-sets, and w; P(g, h)-sets of type 
I; 8, P(g)-sets, B P(h)-sets, and 6; P(g, h)-sets of type II; and 7 Q-sets of 
g-factors, v2 Q-sets of h-factors, and y; Q-sets of mixed type. Taking the sum of 
all values of w; , w2, ws, 8:1, B2, Bs, 71, ¥2, Ys, Satisfying (3.12) and (3.13) we 
have, from assumptions (B,), (Bz) and Lemma VI*. 


itd | 
n"tltny, “s—cah > L(r, 8, &, 2, &) X (Aig — Ato — Aso)! 


(3.15) 
* (Ain — Ace — Asa)’ X (Aten — Aton — Asien)” 
[~» Ly Lr, s, & 0, (Mae) {Maal Moal’ r + seven, 

; Ent 


\=,0 r + 8s odd. 


Where L(r, s, &, 7, ¢) is the number of ways in which: 2¢ factors can be selected 
from r g-factors; 2¢ factors can be selected from s h-factors; and £ pairs of 
g-factors, ¢ pairs of h-factors, and » mixed pairs of g and h factors may be 
formed. Thus 


ii tee th oh meee siege Vn ee a 
(3.16) 7 ~ BRIG — 2B Fie — OF) Deer BFR 


= r!s!/t!»! gi2"*, 
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From (3.15), (3.7), (1.3), and (Bs) 

( ris! \,,./8 — &\, 
(3 17) M,.. d 7 X DC (r+) /2)—t ‘ ): uf =)! r+s even, 
(M, ol” *{Mo.2|"” 


=,0 r + s odd. 


where ¢ assumes even values only when r and s are even, otherwise assuming 
odd values only. 


Obviously the right-hand side of (3.17) is the coefficient of zjz:/r!s! in the 
expansion of 
(3.18) exp {}(zi + 2pzize. + 23)} 
which is the moment-generating function of the distribution 
1/2rV (1 — p*) exp {—(1/2(1 — p))(2’ — 2pry + y')}. 
Thus the result follows from an extension of Ghosh’s theorem [3] to two di- 
mensions. 


It is possible to extend these results to the case of sequences of functions, 


(3.19) tf$ (ay +++ aed} ee Uf (ar +++ ae)} 


j 
satisfying conditions (A) and (B,) and (B,) stated below. Let 
(1) Lo we 
lr = - 2S (x; +++ Lign—a) 
1 il 
PY = BPD) 
My? = E'([F RS? — PY); 


Mii? = E((FLY — PY? Rs? — PY); 


}- 


(3.20) 


Since the functions {f{" (2, --+ 2,)} satisfy the condition (B,), 


(3.21) nM}° ~,us° and lim inf u°? > 0, 
for all t. Let 

(3.22 ui’? = E(F,”). 

It can be shown, as in Lemma VI, that when (B,) holds, 
(3.23) nMiii''? &, wit 


1,1 —p Mil , 


where 


l : . ; 
= » E{ ss? (a 59 Li4+n—1) fj"? (x; o9 + Diahna)} 


TN |i-j|<k 


1 . 
= z EX fi? (ay _ tn) fj"? (wns ++ Dan) | 


NM \i—j\<k 


n k 
Efi (a +++ aa) fi? (ae, +++ Tau) 


l 
1” i,jml apel 


2 (ty) (tg) 
+ ki” wi ‘ 
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where a; --- o, have just oneelement common with 1 --- k and og = a (a,8 S k). 
We shall assume condition (B,) that the matrix 


(lim tii? /V us? wl ij 
(3.25) (B,) aj =4" 
l 
exists and is a nonsingular correlation matrix. We may then state 
Turorem III. Let {fi(a, --+ ae}, (6 =1 --- p) be sequences of functions 


satisfying conditions (A), (B,) and (B,). Then the joint nonparametric distribution 
of 


(FO — PY )/S MP «++ (FY? — PY?)// Mi 


converges stochastically to the multivariate normal distribution with means zero 
and correlation matrix (a;;) given by (3.25). 


4. Randomised distribution of serial statistics and power function. When the 
variables 2, --- x, are independently and identically distributed, the conditional 
distribution in the universe of permutations I',(2; --- z,) is uniform. When, 
however, the variables are correlated or have different distributions, the con- 
ditional distribution in [',,(a, --- 2,) is not in general uniform. For any non- 
symmetric statistic 7'(2, --- 2,) which assumes the values 


(4.1) Pe Pe N =n! 


for different points of T',(2 «++ 2,), let the conditional probabilities associated 
with these values be m, --- my (0; = 1). By a randomisation (random 
permutation) of the sample x; --- x, , we can make the probabilities of 7; --- T'y 
all equal to 1/N, whatever be the alternative hypothesis. The repartition 
(Von Mises [8]) of 7' also gives the probability distribution of 7, when the 
probabilities of T, --- 7'y are equal, that is for a randomised sample. It will be 
called the randomised distribution function of the nonsymmetric statistic 
T(x, --+ a). When the variables x; --- x, form a Markov process of order p 
(stationary or not), we shall find the stochastic limiting form of the randomised 
distribution of a nonsymmetric statistic 


1 n 
(4.2) S(a +++ 2) = > fila o°* Deans) 


TN ial 


where the functions {f,(a, «++ 2,)} satisfy the condition 


(A’) / vee [ | fila, «++ wa) | AF i,..-4 (ti, +++ Tey) < Ce 


for all systems of values of 7; --- % and for all ¢ and s, F 
the joint distribution function of x;, «++ 2; 


iy---ig (Zi, °° * Ze) being 
is 

For the randomised distribution, the expected value E’(S) of the nonsymmetric 
function S is given by 
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(4.3) M, = B(8) =13 Dov Silas «+> ae) 


Te tml n'*) 


and the moments MV, are given by 
(4.4) M, = E’{\(S — M,)'} 2, 3, 4, -*° 


Except for Lemmas III and VI, all the results derived in Section 2, that is, 
Lemmas V, VII, and Theorem I, may be applied to randomised distributions. 
We shall prove Lemmas III(a) and VI(a), generalising III and VI. 


LemMMA III(a). Let U = E’\o(a, --- ae)}. Then for a Markov process of 
order p, we have, when (A‘) holds, 


(4.5) Var (U) < kes/n 


where k is a constant free from n and the function $(x;, +++ 2%), and C2 is the upper 
bound of the second moment of $(x;, +++ Xi). 
PrRoor. 


Var (U’) = E{{ > lo(x,, ae *Zu) — Hi, ig | n*)?) 
=(n) SE i lola, ++ La) — DigeerigllOlty, *°* 2a) — bial) 
where $j,...i, = Ele(xi, +--+ 2,)] and oo is a summation for all systems 7; - -- 4; 
and ji --* je. 
We need only consider such terms in the summation j; --- jx for which at 


ieast one lies in the ranges (4; — p, ti + p) «++ (% — p, t + p). The number 
of terms satisfying this condition is O(n™~'). Again 


E}[o(ai, ane Li,) = Gi,---ig h(i, Ts Liz) ee $),--- inl} 

S VVar [o(2i, «+ ti)|Var [6(a;, «++ 2y)] S 
since, from assumption (A’), the second moment of ¢(x,, --- xi) is bounded 
by cz ; hence the result. 


From Lemma III(a), proceeding exactly as in Section 2, we have Lemma 
VIi(a). : 
Lemma VI(a). 
nM, = ;, . y FE filag +++ Cigues) flay +++ Li¢n-1)} 


P oe 
NM \i-j\<k nik+li-il) 


N \i—s\<k ni) 


 - YD. Et filee, +++ 24) SA Zines *** Ligs))} 


I = > Le Et fay 72° Dp) Silae, ted? Xe,)} 


Nn? ijat afnl nl2e—1) 


+ k? ‘ 1 > : El fai, rs zi,)| \" 


(1 tml niki 
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where i, --* te, tesi *** toe are any set of 2k different numbers from 1 to n, and 
0, *** o, are any fixed set of different numbers, from 1 to n, which have exactly one 
element in common with 1 --- k. 

We now assume that the variables x, --- z, and the functions {f;(2, --- zx)} 
satisfy the condition 


(B’) lim inf pe, > 0. 


TuHeoreM IV. Let x --- x, be a sequence of random variables forming a Markov 
process of order p (stationary or not) and let {f,(x, --~ x~)} be a sequence of functions 
satisfying conditions (A’) and (B’). Then the randomised distribution function of 


(4.7) E De Siles °° Zipp) — Ms |/ Vi 
converges stochastically to the normal distribution with mean zero and variance 
unity, where (2, ++ 2,) is a random permutation of 1% +++ Xn. 

From the randomised distribution of 7 we shall derive the stochastic 
asymptotic expression for the power of a test in T,(2, --- z,), for the alternative 
hypothesis H, , according to which x, --- x, forms a Markov process of order p. 

Let the conditional probability density of x; for given values of x;; --- rip 
be gi(a; | tea +++ tip). When only x, --- xi; (J < p) are given, let the con- 
ditional probability density of x; be gi(a; | zi1 --- x:~;). The joint probability 
distribution of 2, --- z, is then given by 


Pp n 
(4.8) I] gil ai ing °° 21) Tl ; gilxy | Tia °*°° = dz, on dx», 


im p+ 
We shall assume that the functions {log gi(rja1 | zj -+- a1)} (@ = Ll -ee mn 
j = 1--++ p) satisfy the conditions (A’) and (B’) 
A sufficient condition that the functions {log g;(X j4: 
the condition (A’) is that there exists a polynomial 


(4.9a) P(X, +++ Xiu) = DY Ady. X0 + XH 
such that 


, 


X; «++ X,)} satisfy 


(4.9) teetE st KX, >: X0t< PU, ** Bad 


and also that 

(4.9e) | |x|" g(x) dx < A; 

for all s, independent of i, g:(X) being the probability density of z,;. These 
conditions are usually satisfied for exponential type of populations considered 


in statistical theory. 
Thus the randomised distribution of the nonsymmetric statistic 


( p 
T = l Zz log g. (zi Zi-1'** 2) 
\n 


tom] 
+ 7 log gil zi Mug *** 2») | a u,} / VM, 


(4.10) 


im p+ 1 
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converges stochastically to the normal distribution with mean zero and variance 
unity, (z; --- z,) being a random permutation of x; --- z, and M,, M; being 
defined by (4.3) and (4.4) for {log g(X; | Xi. --- X:)} wheni = 1 --- pand 
for {log gi(X; | Xia --- Xi_,)} when i > p. 

At any point (z, --- z,) inT’, , the probability according to H, is 


1 x . 
yr II gi(zi | Zea - >> &) II gi(zi | 2+ +++ 24») 
nic n i=l t= p+1 


C’, independent of the order of 2; --- zn, 


1 ‘ 
a exp {n(./M,T + M))}, 
= { geste 
1 sini ae 
lo exp in MT}, Cy = E’ fexp (nVM2 7}. 
(Nn 
For two constants, T’ and T”, 
Pr {7 <T ST" |x --- 2.) = Pr{7’ <T gs T’|X,} 
(4.12) 1 exp (nV M2T) _ 1 . 


exp (n\/M, T) dG,(T) 


Cn r<tet” n! C, Jr 
where G,(7') is the randomised distribution function of 7 in T,(x, --- 2p). 
Now 


b 
C, = EXlexp n/M,T) = | exp (n\/Ma 1) dG,(T) 


= exp (nM, a)(G,(b) — G,(a)] 
where a and b are constants. Since G,(x) converges stochastically to 
(2m) [ exp — #/2 dk, 
which is uniformly continuous, 


z 


(4.13) G,(z) — (Qn) [ exp — #'/2dt| <« 


with probability greater than 1 — 5, whenever n > mm, uniformly for x. Thus 


G,(b) — Ga(a) = (2) [ exp (— £°/2) dt -e>k k>0 


with probability greater than 1 — 6 when n > nm. Hence 


Pr {C, = exp (nav/M,)k} > 1 — 6. 
Again 


ae 
[ exp (nV M,T) dG,(T) S exp (nV M,T”). 
- 
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Thus 
P{T’ < Ts T”| Xj S 2 exp (nV/M,T’)/C, S 2 exp (nV/M.T")/k exp 


(nV M2a) with probability greater than 1 — 6 Since exp (n1/M2(T” — a)) +0 
asn-—+ «©, when a > 7”, from (B’), we find that P{7T’ < T s 7” | X,! 
converges stochastically to zero, for fixed (7, T’), asn — x. Thus we have 

THeorem V. For the test of randomness of a sequence X, --- X, against an 
alternative H, , for which the sequence forms a Markov process of order p, and where 
the logarithms of the conditional probabilities satisfy conditions (A’) and (B’), 
the acceptance region T’ < T S T” is stochastically consistent, T being given by 
(4.10). 

The randomised distribution of a statistic 7 may be used to find a stochastic 
asymptotic form of the power function, so that a nonparametric test criterion 
may be selected, having desirable properties for large samples, on the basis of 
such power functions. 

Two problems will be considered below for illustrative purposes. The first 
problem is concerned with the test for positive circular serial correlation in a 
sequence of random variables 2, --- z,. This problem has been solved by 
Lehmann and Stein [7], in which they obtain the most powerful randomised 
test function. We shall find the stochastic asymptotic power function of the 
corresponding nonrandomised test for large samples. The second problem is 
concerned with a more general type of stochastic pattern. In this case also it 
may be possible to get a most stringent test for small samples, along the lines 
of Lehmann and Stein [7], though this has not been considered before. From 
considerations of the stochastic asymptotic power functions we get an asymp- 
totically most stringent test in this case. 

Consider a sequence of random variables x, --: x, with circular serial 
correlation so that the conditional probability density function of 24, is 


(27) exp {—(ai4.1 — ba,)?/20°} = g(xiss | 2). 


We have 
(1/n) > log g(@isa | 24) 
} log 2x — logo — no’ >. (ti21 — bax;)° 
$ log 2e — logo — }o°f(1 + 0) > a3/n — (2b/n) > xxi}, 


which depends on the nonsymmetric statistic 7’ = > ritizs/n only, since 
> xi, being a symmetric function of X, --- X,, has the same value for all 
points in I, (a) «++ 2). 

As shown by Lehmann and Stein [7], a uniformly most powerful randomised 
test function exists in this case for values of b > 0 or b < 0 and depends only 
upon values of T = }> ayrias/n. 

We consider only a nonrandomised test. Obviously for a test of significance 
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with first kind of error a, the randomised test is equivalent to a nonrandomised 
test for some values of a. 


Let M, = E'(T) Mz = a'|(T — M;)’| and G,(T) be the randomised distribu- 
tion of (T — M,)/~M;. Then 


PT’ <T < T" |X.) = Praya(T’ < T ST" | Xz) 


(4.14) 


rT’ 


ais 
[ exp [bno “(1 M, T + M))] dG,(T) | exp [bno “(./ Mz T) dG,(T) 
° v7 


/ exp [bn “(1 M2 T + M))| dG,(T) / exp [bno *(+/ M; T) dG,(T) 


We note that Theorem IV can be immediately extended to a set of circularly 
correlated variables. For any given (#0), P,(7’ < T s T” | X,) converges 
stochastically to zero. Thus we study the power function for sufficiently small 
values of b and we consider b, = 0(1/+*/n); say lim... nb, = A, 4 #0. In 
this case we have 

(4.15) Plim nM, = o* 


, 


so that the condition (B’) holds, and G,(7') converges stochastically to the 
normal distribution with mean zero and variance unity. We have also 


(4.16) hn = b,no "+/M, and Plimd, = X. 


Now 


ye had 
I exp (An 7’) dG,(T) = [exp (a T)G,(T)|7" — i An exp (An 7')G,(T) dT, 
T r’ 
1 re 9 ” 
(2x) | exp (A, 7’ — T°/2) dT = [exp (A, T)¢(T)\+: 
ns 


poe 
- | A, exp (A, T)o(T) dT, 
=e 


7 
where @(7T) = (2x)”’ [ exp (—¢*/2) dt. Thus 
af” ; TT” 
| exp (A, T) dG,(T) — (24) | exp (A, 7’ — T°/2) dT 
” 


T’ 


= [G,(T”) — o(T”)| exp (A, T”) — (G,(T’) — o(7")] exp (A, 7”) 


»7T” 
re | (G.(T) — 6(T))rx exp (An 7’) dt. 
4 
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Since Pr { | G(T) — o(T) | < «| > 1 — 6, forn > noe, 8), 


T” Te? | 
I exp (dn 1) dG,(T) — (2m) [ exp (te T — T?/2) dT | 
y° y? | 


= 
S elexp (A, T”) + exp(A, T’)] + «| ha exp (A, 7) dT 
Tv’ 
— 
< 3e exp (4. T”) S 3¢ [ exp (te T) dT 
ne 


T*? 

holds with probability greater than 1 — 6. As [ exp (An7’ — 1°/2) dT is a 

= 
uniformly continuous function of , in (7", T”), we get 
Py,,va{T’ < T 3 T” | Xz} 

(4.17) — 

= ((1+ )/CnV 2x) | exp (AT — T?/2) dT 
Tr’ 


with probability >1 — 6 forn > no(e”), where «, isa small quantity < ¢’, and 
C. = [ exp (A,7') dG,(T). Hence 
PL vs{T’ < T ST" | Xz} 
(4.18) T* 
= » [exp (40°)/CaV2nl | exp [-H(T — »)'1a7. 
In order to show that when the parameter b ~ \/+~/n (A # 0) the hypothesis 
H,(b = O(1V/n)) may be discriminated against the hypothesis Ho(b = 0), on 


the basis of a sample of size n, we prove the following inequality. 
TueoremM VI. 


Pr{ Py,vs( <T<T"|X,) 


< (1 + )/V9al [ “exp | -4 (E | ae} > baa 


holds for n > no(e, 4). 
PRooF. 
oo Ts 
B.'s [ exp (A. T) dG,(T) = / exp (h. T) dG,(T). 


= r, 


As shown above, 


[- exp (Aa 1) dG,(T) ~ , [exp (44°) /V/ 2] [ exp [—4(— — )*] dé 
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that is, 


[ "exp (da T) dG,(T) > exp (3d2)(1 — a) 


holds with probability greater than 1 — 6/2 for n sufficiently large. Thus 
Pr {C, > exp ($d”)(1 — «)} > 1 — 6/2. 


From (4.17) the result follows when « and e” are sufficiently small. 

In the case of a single parameter, the optimum test procedure is obtained 
without the help of (4.18), the stochastically asymptotic power function, but 
we shall see that in the multiparameter case the stochastically asymptotic 
power function is a useful tool for the purpose of finding a test with good power 
properties for large samples. 

Consider a more general stochastic pattern, in which the probability density 
of x; depends upon 2;_; --- x:-, and is given by 


(4.19) g(xi | tin +++ Lip) 
a (1/-/ 2c) exp f—(% — Oe — +s b,x ;-»)"/20°} 


the variables x, --- zx, being considered in circular order. 
1 
- log g(a; | a1 «++ 2i-») 


log 2 oe 
on -<= — logo — anak Cd bp ts)" 


tal 


: : 
log 2x l 2, 2 xi 

oo = log ¢ —- — ‘(1 + bi + --- + b;) XL -—2 plas, 5 
2 2? | n x | 
where r; = >> xai4,/n and by = 1. 

- 2. : . . one ° ° ° 

Since >> xi is a symmetric function of x; --- 2, , the probability distribution 
in T,,(a, «++ 2,) depends upon r; --- r, only and thus we need only consider 
the space R, , consisting of all points (r; --- 7,). 

, / . . . ° 
Let r; «++ r» be standardised variables corresponding to m --- r,, that is, 
(i) (4) , (i) , (i)\2 

re = (r; — M{”)/VWM§ where Mj” = E’(r,) and M;” = E’[(r; — My”)’). Let 
. , , os . . ; , o’. . 
G,.(r; «+ Tx) be the randomsed joint distribution of r; --- rp in Ty(a +++ Xn). 
om , : ‘ ° 
Then G(r, --- rp) converges stochastically to the norma! distrbution with means 
zero and dispersion matrix (y;;) and, as in Theorem V, any bounded region of 
acceptance C in R, gives a stochastically consistent test. In particular, the 
region of acceptance 


(4.20) 


gives a stochastically consistent test, for given values of b; --- b,. 
In order to find a stochastically asymptotic expression for the power function, 
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we consider b; = O(1/+/n), say b; = Xd; Vn, and lim \S” = \°". We have now 
(4.21) Plim nM,” = o*, Plim nM{';” = 0 tj 


so that (B,) holds and a,;; = 6; , where 5; is the Kronecker 6 symbol. For any 


bounded region C in R, , we may show, as in (4.18), 


(Qr)”'*a"C,, 


/ exp (>_A,r’)) exp ( — 


€ 


Pog Vn r\p Vn (C) ms p 


(4.22) 


where 


” gtd 


C, = | exp (5 > b, bir, iV Mo ») Gale tee 
R \9 r 


» 
Corresponding to Theorem VI we have 
THeoremM VI(a). 

l+e 


(Qer)™!2q™ 


[ exp (- tL & - *)) dé-déy> > 1-8 
Cc ao j 


Pr < Pog IVn ALF Vv iy 


(4.23) 


for any bounded region C in R, and for sufficiently large value of n > no(e, 4). 

For fixed values of b; --- b, there exists a most powerful region, but there is 
no uniformly most powerful region for values of b, --- 6,. But we may apply 
the well known methods of multivariate normal theory to the stochastically 
asymptotic power function and select an optimum test. In the present case we 
may consider the most stringent test (Wald [14]) or the most powerful test on 
the average (Nandi [10]), the asymptotic power function being averaged over 
the spheres (\°’)? + --- + (A‘”’)? = »’*. The region of acceptance with these 
properties is given by 


(4.24) (ri)? + ++ + (rp) < a. 


In concluding this paper the author desires to put on record his appreciation 
of the helpful criticisms of the reviewer of the Annals, which led to considerable 
improvement in the presentation. 
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ON THE ESTIMATION OF REGRESSION COEFFICIENTS IN 
THE CASE OF AN AUTOCORRELATED DISTURBANCE! 


By Utr GRENANDER 


University of California, Berkeley and University of Stockholm 


1. Introduction. The following stochastic model has been used in various 
applications. A time series z, is the sum of a mean value m, and a disturbance 
y,. The disturbance is supposed to consist of independent and identically dis- 
tributed stochastic variables with mean zero. The mean value of 2, is a linear 
combination 


mM, ins > ees” 
n=1 
of certain known sequences {g$”}, n = 1, 2, --- 8, the regression variables, but 
with unknown regression coefficients c”. For an observed sample x , 21, --- Zw , 
the problem of estimating the c’s is usually solved by applying the method of 
least squares (L.S.). As is well known this procedure is optimal in the sense that 
the estimates obtained are best linear unbiased (B.L.U.) estimates. 

The problem studied in this paper arises when the disturbance is still stationary 
but allowed to be autocorrelated. If the correlation matrix of y, is known, the 
B.L.U. estimates can be constructed although their form is not so simple as in 
the first case. It is no longer generally true that they coincide with the L.S. 
estimates. In the applications the correlation matrix of the disturbance is seldom 
known. As this is not needed for the construction of the L.S. estimates which 
are optimal in the case of a nonautocorrelated disturbance, it seems natural to 
ask if they have some optimum property for large samples. 

Looking at the problem from another point of view, we ask if it can happen 
that the knowledge of the correlation matrix does not contain any information 
relative to our problem of inference for large samples. 

The main result is given in Theorems 3 and 5 and their corollaries. They express 
the asymptotic efficiency of the L.S. estimates in terms of the spectrum of the 
process and the spectrum of the regression sequence to be defined below. As a 
consequence of these results, we show that in the case of trigonometric or poly- 
nomial regression the L.S. estimates are asymptotically efficient. 

The problem of estimating a constant mean value of a stationary process, 
studied in [2] and [3], can be considered as a special case of our present problem. 
Several authors have studied related questions. We refer especially to [5]. 

2. The disturbance. The process x, is observed at the points vy = 0,1,--- , N. 
It is convenient to allow the process to take complex values. 


’ 
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We shall suppose that the disturbance y, has mean zero and finite moments 
of the second order. Introduce the covariance matrix 


( po, ay pow | 
Myx = Po, Pu, *** Puy 7} Pu = Efy, , 
| pwo, Pwr, “** Pww 


where, as usual, 7, is the complex conjugate of y, . Supposing the disturbance to 
be stationary (in the wide sense) we can write 


Py = Ppa = Ey, Yu - 


As is well known there exists a bounded, nondecreasing function F(A) defined 
in the interval (—-, +) such that 


= [ e*” dF(). 


rT 


Further the process itself has a similar representation 


Y= [ e” dZ(d), 
where Z() is an orthogonal process and 
E | Z(\2) — Zu) |? = | FQ2) — FO) |. 


We shall consider the class of processes which have an absolutely continuous 
spectrum with a continuous and positive spectral density. Then 


F(b) — F(a) = | fr) ar 


where f(A) is the spectral density. This class of processes shall be denoted by Y. 


3. The regression variables. In order not to overshadow the idea of the proof, 
we shall at first consider the case of only one regression variable ¢, . We shall 
deal only with the case when there exists a consistent estimate of c. We shall 
later see that this implies that 


> ~0 | |’ = oe 


As we shall see in Section 5, the asymptotic efficiency of the L.S. estimates is 
determined by certain properties of the sequence ¢g , ¢:, ¢2, °°: . In order to 
specify these properties, the most straightforward thing to do would be to as- 
sume that the sequence has a Fourier-Stieltjes representation. But then we could 
not deal with even such a simple case as gy, = A + By, B # 0. Instead we shall 
use a method which is an extension of generalized harmonic analysis. 

Introduce the notation #(N) = 5°% | ¢, |’. This is an unbounded nondecreas- 
ing sequence for N = 0, 1,---. 
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Dertnition 1. If for every n 2 0 the limits 


R,, lim — er 
= lim 57 De. ? , 


exist, we shall call the sequence {g,} regular and write {¢,} ¢ R. 
DEFINITION 2. Let {¢,} ¢ R. If for every h 2 O the sequences {¢,+,} all gen- 
erate the same set of 


i@y 
(N) > PvihPrt+htny 


van() 


ve shall say that {¢,} is a stationary sequence. 
DEFINITION 3. If the sequence #(N) satisfies 
. &(N + h) _ 
lim - 
N--@ o( “@(N) ) 
for every integer h we shall say that (NV) is slowly increasing. 
TuHeoreM 1. In order that a regular sequence be stationary it is necessary and 
sufficient that ®(N) be slowly increasing. 
Proor: Suppose that {¢,} is stationary. Then 


R, = lim = >> Cr Prin = lim — > Prth Prenth - 
N--@ zi ven() P( %) vee() 


Since $(N) — «~ as N — o, we get 


N+h 


li ¥ on = li ’ ren. 
lim Sy) Derren = im oe ay 


Thus 


R, = im 2 


—_— Re. 
No &(N) 


As Ry = 1 the sequence @(N) must be slowly increasing. By reversing the order 
of the proof the sufficiency of the condition follows. 
THEOREM 2. A stationary sequence has a spectrum in the sense that for every n 


(1) R, = lim 5 : * ee = [ e™ dy(n), 
N--o@ “oe 


va=() 


where ¢(d) is a distribution rede defined in (—r, 1). 

Proor. For n < 0 some of the first terms in the sum appearing in (1) are not 
defined. Since @(N) tends to infinity with N, we can assign arbitrary values to 
them as the limits do not depend upon them. Defining ¢, = 0 for v < 0, we see 
that forn < 0 


Buon lim »—» = lim a 
= lim gy Le? ~ yaaee DUN yee 


= lim i OO, ak : : 
*% &(N) @(N - ti) Pr Pvin- 
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Thus, since #(N) is slowly increasing, we see that R_, = R,, , that is, the matrix 
R, = {Rim ;n,m = 0, 1, --- q} is Hermitian. 
A similar reasoning proves the second equality in 


rar. 2 a.et wares ©) Saad) BO 


nm=l N-@ (N) v=() n= 1 


As all the matrices R, are then nonnegative, the existence of a bounded non- 


decreasing function ¢(A) satisfying (1) follows. Because 1 = Ry = l dg(\) we 


see that ¢(A) is a distribution function in (—7, 2). 

The set of points of increase of ¢(A) is called the spectrum of the sequence 
{¢,} and is denoted S(¢). Although a stationary sequence determines the spectral 
distribution function ¢g(A), it is evident that the sequence is far from determined 
by ¢(A). This is equivalent to the fact that the covariance matrix of a stochastic 
process does not determine the realization of it. Hence to a given spectrum cor- 
responds a multitude of possible sequences. 

In Sections 8 and 9 we shall study the spectra of sequences that appear in 
certain cases as regression variables. 


4. The L.S. estimate. The L.S. estimate coincides with the B.L.U. estimate 
calculated under the hypothesis that the disturbance is uncorrelated. It is 


N N ¥ 
Crs = o Ty Py x: | a y 


ven() vem() 
and its variance, still calculated under the same hypothesis, is 
1 I 


+ soups vn SAD. 
| Pr | 


(2) Deis] = 3 


ven() 


In the following sections we shall study the asymptotic behavior of the L.S. 
estimate. As we deal with a linear problem of inference it is natural to define the 
efficiency of a linear estimate a* as 

en(a*) = D(ad.)/D(a*), 
where atp: denotes the B.L.U. estimate. The asymptotic efficiency ¢(a*) is taken 
as the limit of ey(a*) if this limit exists. 

Because of the linear nature of the problem, it is also natural to call a linear 
estimate consistent if and only if it converges in the mean to the true value. 

It is clear that in the case of a normal process these linear definitions coincide 
with the usual ones. We shall, however, not make ary assumptions regarding the 
distribution of the process beyond properties of the first and second moments. 


Consider two spectral intensities f(A) and g(\) corresponding to two processes 
in the class Y. Introduce 


max f(A) = fe max g(A) Je 


min f(A) =, min g(A) = gy. 
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. . ° . N ° 
For a certain linear combination c* = )->*5c,z, we have, using the spectral 
representation of the process, 


= [ 7 c,e"”” dZ(n). 


-7T ven) 


Hence, if f,(A) is the true spectral density, 


D*{c*| = [ aa" “fald) dn. 


Calculating the variance under the two hypotheses f,(A) = f(A) and f.(A) = g(A) 
we have 


“ > ce” "1IO) dr 
(3) 0< i  Pile*l _ fi dee etttieeai 


do 
=D gic | fi > ce ivr \?g(a) dd fi 


Suppose that $(N) — «. Taking g(A) = 2/2 and f(A) as the true spectral 
density, we get from (3) 


Dilets) < 2nf./(N) — 0, 


so that there exists a consistent estimate of c. 
On the other hand, suppose that c* is a consistent estimate of c. Then there 
exists a consistent unbiased estimate, say cr . But it follows from (3) that 


l 
BN) D, acts] S D ser] 2 4 Dilcd) — 0. 


Thus, the existence of a consistent estimate is equivalent to #(N) — «. 


5. The asymptotic efficiency of the L.S. estimate. 

THEOREM 3. Suppose that 

(a) the regression sequence ¢, is stationary and denote its spectral distribution 
function by (A). 

(b) the disturbance y, € Y and denote its spectral density by f(d). 

Then the asymptotic efficiency of the L.S. estimate is given by the expression 
1 


e’(cis) = 
@) f" 420) fs) ago) 


-e f(r 


Proor. We shall carry out the proof in two stages, first proving (4) for any 
disturbance of the autoregressive type, and then extending it to any process in 
the class Y. 

Suppose that the disturbance is generated by the autoregressive scheme 


Ao Yvip + 1 Yv+p 1 + oe Ap Yv = © 
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with Ee, = 0 and Kez, = 54,, : As usual we shall suppose that the roots of the 
characteristic equation 


az? + ae + --- +a, =0 
are all of modulus less than one. We then know that Ey,é, = 0 for u < » + p. 
The spectral density of the y-process is 


(5) f\ = 1/24 | > a,e” . 


r= 


There is no linear relation between the y’s. We can then use the Gram-Schmidt 
procedure to orthonormalize the sequence yo , yi , Y2 *** Yp—1 and get 


CooYo = No 


CuYi + CwYo up| 


Coa pV p—1 1 Cpa p-2Yp-2 + °** + Cpr0Yo = 
where 
En, = by; — 1. 


As ¢, is orthogonal to m, m, °-* mp1 for vy 2 0 wesee that putting », = «¢,_, for 
v 2 p, the sequence (m , m, 72, °-: ad inf.) is orthonormal. Every y,, v 2 0, 
is a linear combination of 7, , m~-1, -** mo. 

We have the relation 


D=ty +y, v20, 


and, having observed a sample 2 , 21, --- 2~ , we want to consider an estimate 
c* of c 


N N N 
c* = Doct, = € Dg, + DOr. 


v=() v= () ve=( 


Let us call L the linear transformation that carried (yo ,yi, --* , yw) over in 

(no,™m,°** , mw). The corresponding matrix has then the rows (co , 0,0, --- , 0); 

(Cw ,Cn, *** , 0); tees (cy b@s Getty *** Cont ot » O °es , 0); (a>, pt, °°" 
- , ao, 0, --+ , 0) and so on. Introduce 


, 


L (ae. Xa» oo" e , tn) os (f , &, a » §n), 


L(go , 1, *** » 9w) = (Bo, Bi, *** , Bw). 


As L is nonsingular, every linear estimate can be written as 


N N N 
ct =) ¥,£, = cL wb +> wn. 


v=() vent) 
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But now the n’s are uncorrelated and all have unit variance. Because of (2) we 
then see that the variance of the B.L.U. estimate is 


N 
(6) D*{chivl = | > | B, r. 


v==() 


To get an asymptotic expression for the minimum variance values for large 
values of N we note that the p first terms in the sum in the denominator of (6) 
contribute only a finite amount. Hence we need not take them into account as 
the sum diverges because of what has been said in Section 4. We have 


1 > N F 
aN) & = - x, Oe 50 55 ee 
Hence 
lim &(N)D™ Icbic} =p > a, Ay, Misa ° 
N-+a@ t,u=0 
Using the spectral representation of the R’s we get 
lim @"(N)D“letic] = [ > a,e*™ * de). 
N--@ t=() 


Combined with (5), this gives 


lr l 
(N) re) dg(d) ° 


(7) Db’ Ichi] ~ 


F(X) 
Let us now derive an asymptotic expression for the variance of the L.S. esti- 
mate. We have 
i< 
= ——— ? g. 
&(N) v=() . . 
and hence 


N 


D'{cis| = . 
levsl = S3cN) —, 


Pv Pu Pu—r- 


Thus 


(8) &(N)D'*{cis] = ¥ Pv Pvin + > aa Pv Prin: 
a v=O a= aw n=—1 


As p, is dominated by some Ka'"' with 0 < a < 1, and as 


— . 'N 


lel Say Lo "gt 


v=n 


¥ / 1 he 
; Pr Prin, S A/ = 
@(N) | i= ; V @(N) & 
we can perform the limit operation in each term of (8). We then get 


x 


lim &(N)D'Ici 5] = = ok. = Zz o» | e ™* de(h). 
N--ax n= 20 7 


n= 20 
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As we have 


f~) =~ Dose, 


T —« 


where the sum is uniformly convergent, we get 


Le 2 ? 
(6) Ditetal ~ gaan [ 50) de()). 


Combining (7) and (9) we have 


1 


ets) = => 


~ f* ded) [ 
[. fay Lf” deQ) 


which proves the theorem for any autoregressive scheme. 

To extend this relation to an arbitrary process in the class Y we shall use a 
method due to G. Szegé, who suggested it in a discussion with the author, to 
prove the asymptotic efficiency of the equidistributed estimate of the mean of 
a process in the class Y. In statistical terminology, it consists in approximating 
the process by an appropriate autoregressive scheme. 

As f(\) has been supposed to be positive and continuous, it is seen (using a 
well known argument) that it is possible for any 6 > 0 to find a trigonometric 
polynomial P = >>? a’, where p is a sufficiently large number, such that 


f(s) S 1/\8Q)|? and 1/|s(A)|? — fA) S 8. 


As we are interested only in the modulus of s(\), we can if necessary change the 
a’s leaving | s(A) | unchanged in such a way that all the roots of the character- 
istic equation are less than one in absolute value. 

In this way we can find two spectral densities of the type (5) such that 


(10) AOA) Sf) S fed), 
(11) fol\) — fi) S 6. 


For any linear estimate c* it follows from (10) that its variances calculated 
under the three different hypotheses satisfy 


D} {c*] < Dict) Ss D},{c*). 
This gives us 


ar or 


wii etna a Me r\ m2 .* < i; r\ n21..* | 
[’ dy(n) = lim P(N )DAlenv) = lim P(N )Dyle BLI ] s [ dg(n) . 
tee fo(d) ayy 
Using (11) we see that 


Qn 
: r\ n2r.* = 
lim ®(N)Djleacv| = > 
N-« 


ro 
ee J(A) 
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Similarly (9) is extended to hold for all processes in the class Y. This com- 
pletes the proof of Theorem 3. 

Coro.iary. In order that the L.S. estimate shall be asymptotically efficient 
whatever be the spectral intensity, it is necessary and sufficient that the spectrum 
S(¢) consist of only one point. 

In other words, the knowledge of the true covariance matrix gives us no 
additional information of relevance to our problem if and only if S(¢) consists 
of only one point. 

Proor. Supposing the asymptotic efficiency of the L.S. estimate to be e, we 
get from (4) using Schwarz’ inequality 


. ( r 6 2 
So / w ~ f fr) ded) 2 If dot | = 1. 
e « F(A) ® - 


The equality sign holds if and only if f(\) = const. for all \ e S(¢). This identity 
is true for every spectral intensity in Y if and only if S(¢) reduces to a single 
point. This condition looks at first very restrictive but, as we shall see later, 
it is satisfied in the most important cases of analytical regression. 


6. Possible extensions. The extension of Theorem 3 to several regression 
variables is studied in the next section, so will not be discussed here. 

We have demanded that the disturbance process have an absolutely continu- 
ous spectrum with a positive continuous spectral density. This is likely to be a 
much more stringent condition than what is needed for the theorem to be true. 
Consider the case g, = 1. Then #(N) = N + 1 and R, = 1, so that S(¢) con- 
sists only of the point \ = 0. The theorem states in this case that the equidis- 
tributed estimate @ = >°*2,/(N + 1) of the mean value of a process be- 
longing to the class Y is asymptotically efficient. 

This result was first proved in [2] in the case of a continuous time parameter. 
The same proof can be used also in the considerably simpler case of a discrete 
parameter. The conditions imposed upon the process were weaker than those 
defining the class Y, and in [3] the result was extended to spectra containing 
also discrete and singular parts. This makes it at least plausible that Theorem 
3 and its corollary hold for a class of disturbances considerably larger than Y. 
However, the method of proof used in this paper does not seem to lend itself 
easily to such an extension. 


7. The case with several regression variables. In the treatment of this case 
we shall deal only with questions which did not appear when we had just one 
regression variable. The remaining part of the proof consists of a straightforward 
generalization of the procedure in Section 5. 

In order to describe the spectral properties of an s-dimensional sequence 
(gt”, o”, --» o”) we consider the expressions 


N 


?(N) in > | eg" - 


ven) 
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which are supposed to be unbounded and slowly increasing as N 
suppose that the following limits exist 


N 


9 i al mo Se (3). (e) 

(12) Ry(n) = lim Vat Mee, dvs ee, 

Using the same argument as in Section 3 one can show that these limits are 

invariant against translations and that they have the properties of correlation 

matrices between s stationary and stationarily correlated stochastic processes. 
From Cramér’s extension (cf. [1]) of Khintchine’s theorem on the representa- 

tion of a correlation function we see that there exist s’ complex-valued functions 


of bounded variation F’;,(\), j, k = 1, 2, --- s, defined in the interval (—7, 7) 
such that 


Ry(n) = [ dF 5,( > 


For the increments of F(A) over an arbitrary interval in (—7, 7), the matrix 
{AF x(A); j,k = 1,2, --- , 8} is Hermitian and nonnegative. From this it follows 
that the F;; are distribution functions. 

Of course we want to exclude the case when the regression variables are 
linearly dependent. We do this by assuming that the matrix {R,,(0); 
j, k = 1, 2, --- 8} is nonsingular. 

The introduction of the spectral measure is not quite as straightforward as 
for s = 1. Consider a rectangle r with sides (a; , b;), (a2, be), «++., (ds, Bs), in 
the s-cube with sides (—7, +). Denoting the difference operators corresponding 
to these intervals by A; , Ae, --- , A, , we form the nonnegative matrices 


Ni = [AiFxj, Ne = {dF pa}, ove N, = {4.F x}. 


Let P be a permutation (1, 2, --+ 8) — (1, 2, «++ i.). To P we associate the 
determinant 


(13) Dp = | Mi,F x; j,k = 1,2, ++ 


We define the spectral measure of r as 
] 

(14) g(r) = of = Dp 
 . 


where the summation is extended over all the permutations. We have to show 
that ¢ is nonnegative. As the value of ¢ for the whole cube is finite we can extend 
yg to a bounded measure in this cube. 

As is well known there are Hermitian matrices f; , f., --- f, so that N, = f., 
y = 1,2, ---, 8, that is 


A, F x —_ 2X fir. 2. 
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(i ets) e(ig) plig) (is (ij 
Dp = > fj nj Jnjk| = > Jin, auc °° ° Sumo Snje 
my Me,°** hal my Me,°** Nal s 


After permutation of the rows of the determinants 


Dr= Dy (= 1)" fits Sita °° Slt, |S 
Uy bess (,=l 
where (—1)" is +1 or —1 according to whether P is an even or odd permuta- 
tion. The permutation (1, 2, --- , 8) — (ki, ky, «+> , k,) is the inverse of P and 
has the same order. But from the Hermitian property of F”, 


LD= LY Mili fhl20. 
r Bysbas++-bgnl 

The measure ¢ obtained in this way will be called the spectral measure of the 
regression variables. The set of points in s-space such that every cube containing 
one of them in its interior has positive spectral measure is called the spectrum 
of the regression variables and denoted by S(¢). From | R;,.(0) | 4 0 it follows 
that S(y) is not empty. It is clearly symmetric with respect to permutations of 
the coordinate axes. 

Turorem 4. The joint asymptotic efficiency of the L.S. estimates is given by 


(15) e” J te| 


= ; ' l 
[ sou) -++ f(u,) de lmomom, dy 


where the integrations are carried out over S(¢). 
Proor. It is sufficient to deal with the case when y, is an autoregressive proc- 
ess, as then we can apply the same approximation procedure as in Section 5. 
Consider a matrix of the form 


M= \/ k(\) dF (A); j,k = 1,2,-+-, ‘| 


where k(\) is a continuous function. Then one can show easily that 
[ klus) +++ us) deur, +++ 5m). 
8(¢) 


Denote by A, B, C, respectively, the matrices obtained in this way by putting 
k(A) = 1, f(A), 1/f0). It is easily seen that A = {R,,(0)} and that B and C 
are nonsingular. 

Proceeding in the same way as before we derive the relation 


l N . N r 
lim Joni V)O"(N) cov [> 8, 2; > a” | = 29 f(A) dF yA). 


vel) y=) 
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From this, 

lim cov [ct us; Crus] = m 

~ miki 7 AT +j, LS k,L3s| = k 
yw V9 (N)b® (N) was ms 
° ‘ —1 lor . ° ° 

where {mj ;j,k = 1,2, --- 8} = A” B*A ~. To find the asymptotic covariances 
we use the same linear transformation as before and obtain 


1 
‘ 
me s/o) (N)&”(N) 


. r * . * os 
COV [¢j.euv 3 Ce.eww) = Nik 


; ‘ 1 
where {nj;. ;j,k = 1,2, ---s} =C. 

We define the (joint, linear) efficiency e of s estimates with moment matrix 
a as 

e=limex, ev = (|B\/\a|, 
N-*2% 

where 8 is the moment matrix corresponding to the B.L.U. estimates. 

Combining the obtained relations we get 


ais le | J ae] 


| B | | C | [ sow) ge f(u,) dy / f(u ) f(u,) o 
. 1 ow . 


Coro.uary. In order that the L.S. estimates shall be (jointly) asymptotically 
efficient for every disturbance of the class Y it is necessary and sufficient that the 
spectrum S(g) of the regression variables contain only one point and the symmetric 
images of it. 

Proor. From Schwarz’ inequality it follows immediately that 


| f | Ss [ fou) +++ f(u,) de | fu) tu.) 
f(m) «++ flu, 


with equality if and only if f(wm)f(u.) --- f(u.) = const. for almost all 
(u,, Us, -** Us) € SS). 


If the spectrum contains at least two points which are not symmetric, the 
asymptotic efficiency cannot be one for all residuals of type Y. The sufficiency 
of the condition is obvious. 


8. The case of analytical regression. When the regression sequences are given 
a priori in the form of analytic expressions in v, we shall speak of analytical 
regression as opposed to the case when {oS}, {yf}, --» are obtained as measure- 
ments of certain variables which are of a nondeterministic structure. 


Let us suppose that we have a pair of regression variables of the type 


% = vf e” dF(\) v, = vf e'” dG) 
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where p and g are nonnegative integers. Both F and G are of bounded variation 
but not necessarily real-valued. To bar cases where no consistent estimate of 
the regression coefficients exists, we shall suppose that both F and G have at 
least one discontinuity. This condition is satisfied in the applications we are 
going to consider. 

Separating the continuous and discontinuous parts of F and G we have 


F(A) = FAA) + Fala) G(A) = GAA) + Gala). 
Let us put 


c, -/ e” dF.AN); d, [ e” dF 4d); 


1 = e"dG(r); 3 [ e” dGa(d). 
We have 


N 


16a, It on ; 
(16a) lim 5 |e * = tim 5D |e 


and 


im 5" Y\a/ = > | m, |? > 0 


~o 2N = 


{ N 


| tim say | 6 | = Le |u|? > 0. 


N--@ —N 


(16b) 


The frequencies where either F(A) or G(A) has a saltus are denoted i, , with 
AF(\,) = m,, and AG(A,) = un. 
Consider the quantities 


Cs 1 > : 
a a? N2rti | Or Vere 


/ Le os ; 
ry(8) = Ni Xd, 540v' (v + 8)! 
where r = (p + q)/2. We have for large values of N 


N 


| rv(s) ne ry(8) ls SZte Yo+e | 2 + Cobre | + - N == | dy Yrs | 


v0 


sie \2 i< 21, * 
1S hel We | Yee | +2 W2y|& HW Le Bote | 


vew() 


N 


1 
+24/} 1S a, Py ey | tote | 


yew y=) 


Using (16) we see that limy.. | rv(s) — rw(s) | = 0. But it is easy to calculate 
the limit of ry(s). If the number of discontinuities is finite we get immediately 
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N 
7 ’ 1 , tyh —<é(v-+s)a 
lim ry(s) = Not > dv? me™(v + 8)*p, rr” 


N-e% u,v val 


(18) 
l —éar 
= Viet >. Mine * 


If the number of discontinuities is infinite, a slight modification of the previous 
inequality shows that (18) holds in this case, too. 


It is now easy to deal with the case when the regression variables are of the 
form 


(19) »o =v” [ e* dFo(d) +»? [ ef dFi(X) + eee + [ dF (A). 


For v > O we can write this as 


¢> =v? if ef” dF od) + af? + «ee + as” | 


Since 


x 


1 (8) 12 
> | as < ©, 


vel 


it follows that a)” have Fourier-Stieltjes representations with absolutely con- 
tinuous weighting functions. Hence 


% = ? | c dF (A), 


where F'(\) has the same discontinuous part as F(A). We shall call the frequencies 
corresponding to these discontinuities the stressed frequencies of the sequence {¢,}. 


THEOREM 5. A pair of regression variables (yg, V) where each one is of the type 
(19) is stationary with the speciral distribution functions 


1 . 
Fy,(\) = == | m, |° 
” Dd | ms |” 2, 


Py(k) = V (2p + 1)(2q + 1) 1 ae 
’ r+ VD | me |? D0 | ae |? Aer 


_ Vp + 1)(2q + 1) 1 > ava 
2r +1 VE jaw laf Ge: 


u 


<A 


ee 1 * 
F(X) ° Fie 2 | Mu | - 


Proor. Putting ¥V = ¢g and s = 0 in (17) we get from (18) 
. 


at > ae 1 . 
fies seme 2 lee (? @ fe gees @ ft 
yew NH 2X hes ye N®H pF i ee 
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Hence 


(20) 


These functions are slowly increasing and, as we have already seen, {g,} and 
{¥,} are regular. It follows that the pair (¢, ¥) of regression variables is sta- 
tionary. 

We get from (17) and (18) that 


Ry(n) = " ~ : >. Mm, 


V (2p + 1)(2q + 1) I 
Ryw(n) = = = > MuBy € 
2r +1 VD lm? DO | ww | 


/(2p + 1)(2q + 1) | ” 
Ra(n) = ¥ = ; lublu ¢ 
a\it or + ] vy m,, 2 > be ? A Mup 


Re(n) = - - 3 | 2 inky 


These relations determine the functions F;;(A). We note that 


| Ru(O); j,k = 1,2| = 1 — | Rx(0) |’ 


~ (2p + 1)(2q aa 1) p> Mu Bu | 


(2r + 1)? 7. m,, | z we | 


u 


It is easily seen that this is positive, as is required if one wants to apply Theorem 
4, if and only if at least one of the two following conditions is satisfied: 

i. p * q. 

ii. There is no linear relation Am, + Bu, = 0 for all u. We note that if there 
is one such linear relation this implies that the stressed frequencies of {¢,} and 
{W,} coincide. 

Let us study the case of s regression variables of the type described above. 
If \,, Ax, «*: is the set of all the stressed frequencies we will denote the saltus 
at \, corresponding to the ith variable by p:”. The value of p for the ith variable 
will be denoted by p;. Then it is easily seen that the matrix A = {P;,(0)} is 
given by A = DAD where 


( @ (v) 


(y) 
k 
\ My 2B 


—_ a : I, k = 
\Snt+nuri’? 
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and D is the diagonal matrix 


D on (2p; + )) ° 


10s” F 


vel 


so that A and A are singular or nonsingular at the same time. To get a criterion 
for the nonsingularity of A we consider the quadratic form 


(») P 
py 2 '2;| ae 


| j=l 

where z is a column vector with components 2; , 22 , --- 2, . If A is singular there 
exists a nonzero vector z for which the above quadratic form vanishes. But 
from this follows 

(vy) ) 

ws 2?é 2; = 0 
j=l 

for all vy and x. We will assume that we have labeled our regression variables 
in such a way that 


Then we get 


at 


a ee 


J=majiitl 


for all vy, and as z # 0 at least one of these relations is nontrivial. The converse 
of this is shown in the same way. Hence a necessary and sufficient condition for 
A to be nonsingular is that in none of the ¢t classes there exists a nontrivial linear 
relation between the saltuses corresponding to regression variables in such a 
class. For s = 2 this reduces to conditions 1 and 2 above. 

Assuming that A is nonsingular, we will study the spectrum S(¢). It is clear 
that it is discrete. Consider a point 1 = (l,, , --- l,). If e(l) > 0 it is neces- 
sary that l,,1,, -- l, coincide with some of the stressed frequencies, but this is 
not always sufficient. Say that 1 = (A,, , \,, , --* A»,) where the »’s are not neces- 
sarily distinct. Then 


1 1 
g(l) = [ ree [ | D(a) |* day day +++ da, 


where D(z) = | u\’""2P!; j,k = 1,2,--- 8]. 
To show this, consider a permutation P:(1, 2, --- 8) — (i, i2, «++ a) and 
put 1; = P;. Then for j,k = 1,2, ---+s 


9 &9 


’ 


fr? ag? eC 


D+ mh +1 


1 
| Ap(x) dx; dr. --- da, 
0 


“0 
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where 
al ae (¥p) =(¥p,)  PitPR | 
Ap(x) = | mw; ee x5, |. 


But 


Ap(x) = TT wire een(—1)? | al’? oP? | 
n=) 
and hence >> p Ap(x) = | D(z) |’. 
We are especially interested in the case when S(y) reduces to a single point 
(and its symmetric images). Let us treat two important cases. 
First assume that all p; are different. If g(l) = 0 we see that D(x) = 0, im- 
plying that 


(¥1)_ (rq) (¥4) 
Mi,’ Migr tes yt = 0 


for all permutations (1, 2, --- s) — (4, i, --+~ 7%). If one regression variable 
: 1 
has two stressed frequencies, say ¢” has \’ and \”, we choose 


Uvi= (V’,hL,l, +++) ” = (\",h, ls, +++ |) 


where I, is a stressed frequency of g!” and so on. Taking (i, 2, --- %) = 
(1, 2, -++ 8) we see that ¢(l’) > 0 and ¢(l”) > 0. But I’ cannot be obtained by 
permutations from 1”. On the other hand, if each variable has only one stressed 
frequency it follows that S(¢) consists of only one point because if g(l) > 0 
then there is at least one permutation (7; , i2 , -- + 7,) so that 


wi: ~ 0), use? ~ 0), see use” ~ 0. 


frequency of ¢$” and so on. This proves: 

Coro.iary 1. If all p; are different, it is necessary and sufficient for the L.S. 
estimates to be asymptotically efficient, whatever be the spectral density f(r), that 
each regression variable have only one stressed frequency. 

Assume now instead that all p; = p. Then 


But then / must be a permutation of (A; , A2 , «+: A,) where A, is the only stressed 


D(x) = | uj"; 9, & = 1,2, «++ 8| (axe +++ a)”. 
If g(l) > 0 it is clear that the v’s must be different. If each regression variable 
has only one stressed frequency S(y) must consist of one point only. We are 
going to show that the converse of this is true. 
Let | be the only point in S(v) and suppose we have labeled our variables so 
that 1 = (Ai, \2, «+: A.). Then the column vectors 


oe oe (1) 
a? [ui pre 40 ae 


(2) (2) 
2 = [ur ,u2,°° 
=! (s) (2) 
= es 6 MO ey ° 
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are linearly independent and span RF’. If there is one more stressed frequency, 
say A,s4:, then 


(+1) (#+1) (e+1) 
Veui = [pi » Me go *. ae | #0, 


and there are constants 8; so that 
Veri = Bits + Bod, + +++ + Beds. 
As the point 
I? ws (Ar, Az, °° * Agen y Avpty °°* Ae» Acgs) 


is not an image of / it should have spectral mass zero. Then the determinant 
formed from the corresponding column vectors should vanish 


x 
j ! 
| v1, v2, coe , Hint, Vint, °°° » Vo, Yori | @ > 8; | gs » 0; | 
j=l 


= Bi | 1, V2, *** , Vint Vinay “°° » Ve, % | = 0 


so that 8; = 0. As this would hold for all 7 we have obtained a contradiction. 
Coro.uary 2. If all p; = p then a necessary and sufficient condition for the L.S. 
estimates to be asymptotically efficient, whatever be the spectral density f(d), is that 
each regression variable should have one stressed frequency only. 
Coro.iary 3. In the case of parabolic regression, ¢\” = v” and 


ed 7 Cov” + Yes 
e 


the L.S. estimates are asymptotically efficient. 

This follows immediately as | A | > 0 because the p’s corresponding to differ- 
ent regression variables are different and each component has \ = 0 as the only 
stressed frequency. 

Coro.uary 4. In the case of trigonometric regression, 


(n) ivr» 


gy =e and y= Zz c,e™* + >, 


the L.S. estimates are asymptotically efficient if the d’s are different. 

In this case p = 1 for each component but as the nth regression variable 
has only one stressed frequency, X, , it follows that | A | > 0 and the corollary 
holds. 

Other types of analytical regression could be studied in a similar way. In 
order that the same method shall be applicable, the regression variables must 
not be too small at infinity (in which case («) < © and no consistent estimate 
exists) or too large (so that 6(N) is not slowly increasing and the sequence is not 
stationary). The first case does not seem to be of much interest but perhaps 
this can not be said about the second one. 

Consider the case of exponential regression, that is ¢, = a’, and let a > 1. 
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Then for n 2 0 


N--@ @(N) ve() N--2 
so that ¢, is a regular sequence. As 


@(N + n) Qn 
= l 
on * * 


the sequence is not stationary and one verifies easily that it has no spectrum. 
One could still in a natural way attribute certain stressed frequencies to func- 
tions of this type of growth and in this case we would get A = 0 as the only 
stressed frequency. We shall not pursue this question but only show that the 
L.S. estimate of the corresponding regression coefficient is still not asymptoti- 
cally efficient for all disturbances of the class Y. 

As ¢, is increasing very fast, it seems probable that Co = tx/en = a 2x 
would be an asymptotically good estimate. Its variance is 


D'{ch] = aD [xy] = aq” [40 ax. 


cts = (Zen) /(Ze") 


For the estimate 


we have 
N+1,4(N+1)A 2 


- © 


D*{ct 5) = f(r) dr 


1 — ae 


—N-1 t(N+1)A |2 
a oar 


sebnitidlaiaenese f(r) ar. 
1 — ae’ 


‘ (1 — a’)? ¢” 1 
D’ [cts] ~ a [ jt — ae™ pI) dy, 


which gives us 


D'{c3] a [ f(A) dr 


lim — Fs = 

biel D*(? ] (1 in a’)? x . 

: [ana 

Lag | 1 aan ae” P 

If the disturbance has most of its spectral mass concentrated in the neighbor- 

hood of \ = 0, the last expression is near [a/(1 + a)]’ < 1. Thus the L.S. esti- 
mate is not asymptotically efficient for all spectral densities in this case. 





REGRESSION COEFFICIENTS 


9. The case of fixed variates. When nothing is known about the way in which 
the regression variables are generated, it is clearly impossible to make any 
general statement about the asymptotic efficiency of the L.S. estimates of the 
regression coefficients. 

Let us consider a very simple model of generation which is not of the analytical 
type. For simplicity let us consider the case s = 1. Let {¢,} be a stochastic 
process independent of the residual process, stationary (in the strict sense), and 
ergodic. We denote its spectral distribution function by P(\). Then it is known 
that (with probability one) the sequence {¢,} has a spectrum and that the 


spectral mass in (a, 8) is 
8 s 
| arr) / [ dP(r). 


(ef. [4]). It follows from Theorem 3 that the asymptotic efficiency of the L.S. 
estimate, regarding the ¢’s as fixed variates, is almost certainly 


© 2 7 | " 
g? = dP( l/! IP ( / ‘(r) dP(A). 
‘ lf PQ) 7a) 2% |, £0) aPo) 


If the spectrum of {¢,} contained more than one point we see that e < | (ef. 
the corollary of Theorem 3). 

It would be interesting to know how generally this holds. A class of processes, 
generating the regression variables in a natural way, could be defined as follows. 
Let A be a finite difference operator and suppose that {Ag,} forms a stationary 
process. One simple case is when Ag, = ¢,4: — ¢, and Ag, is purely random. 
Then ¢, would be a temporally homogeneous differential process. Are the realiza- 
tions of such processes regular and stationary in the sense of Section 3, and if 
so, what spectra do they have? 


10. The information of the covariance matrix. If the regression sequence 
{y,} has a spectrum consisting of a single point we have seen that the L.S. 
estimate is asymptotically efficient. We have then said that the knowledge of 
the true covariance matrix of the disturbance does not give us any information 
with respect to the problem of inference under consideration. In this section 
we shall make this statement more precise. 

Suppose that we have two covariance matrices M, and M, for the residual 


ye Y. Let the B.L.U. estimates calculated under the two hypotheses M, and 
M, be ct and ec? . If 


Dict] si 


ve D{c?] 


lim 


for all possible y irs M, , M., we shall say that the covariance matrix gives us 
no information. 

Let the spectral density under the two hypotheses be f;(\) and f.(A) and let 
the true one be f(A) corresponding to a covariance matrix M. The corollary of 
Theorem 1 implies that the correlation coefficient between cts and the B.L.U. 
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estimate tends to one as N tends to infinity (we are considering the case when 
the spectrum of ¢, is reduced to a single point). Hence 
D, {ef — cts]/D;,{cis] — 0, 
But because of (3) we get 
Djlet — ets\/D, | cis | 0, 


But the triangle inequality gives us 


Djlci — chs) = | Diet] — Deets) 


and hence 
Djlet\/ Dyers] > 1, 
Similarly we get 
Djlef|/Dflcts| = 1, 
so that 
D,et\/Dyler| > 1, N- 


Thus we have shown that if the spectrum of the regression sequence contains 
only one point, then knowledge of the covariance matrix does not give us any 
information of interest with respect to estimating the regression coefficient. 
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A SINGLE-SAMPLE MULTIPLE DECISION PROCEDURE FOR RANKING 
VARIANCES OF NORMAL POPULATIONS' 


By Rosert E. BecuHorer AND MILTON SoBEL 


Cornell University 


Summary. A single-sample multiple decision procedure for ranking variances 
of normal populations is described. Exact small-sample methods and a large- 
sample method are given for computing the sample sizes necessary to guarantee 
a preassigned probability of a correct ranking under specified conditions on 
certain variance ratios. Some tables computed by these methods are provided. 


1. Introduction. In an earlier paper [3], one of the present authors proposed a 
single-sample multiple decision procedure for ranking means of normal popula- 
tions with known variances. Although the procedure described in that paper 
can be used for ranking variances if the sample sizes are sufficiently large, the 
question as to which type of large-sample approximation would give satisfactory 
results required further study. In addition, since much applied statistics involves 
small sample sizes, it was felt that it would be desirable to develop an exact small- 
sample theory for ranking variances of normal populations. The formulation 
of the ranking problem as given in this paper is the same as the one given in the 
earlier paper. However, the earlier paper treats the problem somewhat more 
generally, and the reader is referred to it for additional background and motiva- 
tion. 

Neyman and Pearson’s [10] L; test as modified by Bartlett [2] (with the new 
tables of Thompson and Merrington [13]) is the best known and most widely 
used test? for the homogeneity of variances. However, even in situations where 
the test is appropriate,’ it has a very important deficiency—namely, that its 
power against various types of alternatives has not been determined. 

In many situations the test is used inappropriately, particularly when the 
experimenter has strong a priori reasons for believing that the population vari- 
ances can not, in fact, be exactly equal. Many times in such situations the ex- 
perimenter would like to know, for example, which population has the smallest 
variance. What he requires is a decision procedure which will tell him which 
population to choose, and an operating characteristic curve which will tell him 


Received 6/18/53, revised 2/2/54. 
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? Recently Hartley [7] introduced an easier (from the computional viewpoint) but less 
powerful test of the same hypothesis based on the maximum F-ratio. See also Cochran’s 
test [5], [6] for the significance of the largest of a set of sample estimates of variance. 

* For example, before attempting an analysis of variance test, an experimenter might 
want to do some preliminary sampling in order to obtain information concerning the validity 
of the assumption of homogeneous variances. 
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the probability of his making a correct choice if he follows the given decision 
procedure. The ranking procedure described in the next section is designed to 
handle this latter situation. 


2. The ranking (multiple decision) approach. 
2.1. The mathematical model and related definitions. Let X;; be normally and 
independently distributed chance variables N(X;; | ui , 03), 


, Ni). 


We assume that the u,; are known, and that the ¢? are unknown. (If the yu; are 
known linear combinations of parameters which themselves are unknown and 
which must be estimated from the data (a typical situation in regression prob- 
lems), the only effect will be to change the degrees of freedom associated with 
the estimate of each o; .) Let 


. 2 2 2 
(1) G1) S ofa) Sees 3 of) 


be the ranked oj , and let 


(2) 6:5 = of4/ots i,j = 1,2, 


be the variance ratios; we assume that it is not known which population is asso- 
ciated with of, . We further assume that for each population, the only parameter 
of interest is the population variance, the “‘best’’ population being the one having 
the smallest variance, the “second best” being the one having the second smallest 
variance, etc. Alternatively, we might have defined the “‘best’’ population as 
being the one having the largest variance, etc.; the mathematical theory is simi- 
lar for both cases. 

The k populations might be k different lots of ammunition, and o; might be 
the (population) target dispersion of the 7th lot, or the k populations might be k 
different measuring instruments, and o; might be the (population) variance of 
measurement of the ith instrument. This variance, which characterizes the 
reproducibility of repeated measurements of the same quantity, can be used as 
an index of the precision of the measuring instrument. We would like on the 
basis of a sample of Dia N; independent observations to make some inferences 
about the “bestness” of the populations. 

Our inferences will be based on the sample variances, by which we mean the 
best unbiased estimates of the corresponding population variances. The sample 
variance from the ith population, and the number of degrees of freedom (d.f.) 
associated with this estimate, will be denoted by s; and n;, respectively. (For 
simplicity of notation no attempt will be made in this paper to distinguish be- 
tween chance variables and their observed values.) The sample variance asso- 
ciated with the population having population variance o[;, and the number of 
d.f. associated with this estimate will be denoted by 873) and ni) , respectively. 
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Thus 
‘ 2 2 2 
(3) 1989/01 4) = Xai 


The ranked s; will be denoted by 


(4) 8(1) < 8f2) < +++ < 8a). 


(If two or more s; are equal, they should be “ranked” by using a randomized 
procedure which assigns equal probability to each ordering.) 

2.2. The goals, requirements, and procedures. Different goals are appropriate 
for different practical situations. We shall assume that in each situation it is 
the experimenter’s responsibility to decide, before taking any observations, pre- 
cisely what his goal is. Two representative goals will be considered here. All 
problems of dividing the k populations into groups will be special cases of these 
two, or will require a similar development. 

Goat I. To divide the k populations into two groups, the ¢ “‘best’’ and the 
k — t “worst,” the ¢ best being wnordered and the k — t worst being unordered 
(lists k-—1). 

Goat II. To divide the k populations into ¢ + 1 groups, the ¢ ‘‘best’’ and the 
k — ¢ “worst,” the ¢ best being ordered and the k — t worst being unordered 
(istsxk-—1). 

It should be clear that, for Goal I, the problem of choosing the ¢ ‘“‘best’’ is 
logically equivalent to choosing the k — ¢ “worst.” It should also be noted 
that, for Goal II when t = k — 1, the problem is that of requiring a complete 
ranking. The goals coincide for t = 1. 

For Goal I it is assumed that the experimenter can specify a smallest value of 
6.41.1, Say O%41,,, that he desires to detect. The experimenter also must specify 
the smallest acceptable probability of achieving Goal I when 6,,;,, 2 Sesas 

lor Goal IT it is assumed that the experimenter can specify a smallest value of 
each 6541; , Say 6... (¢ = 1,2,---, t) that he desires to detect. He also must 
specify the smallest acceptable probability of achieving Goal II when 6,,;,; 2 
Orc 4 = 1,2, +++, 2d. 

The statistical procedure for achieving these goals is essentially the same for 
the two cases. The experimenter takes a predetermined number N; (depending 
on the goal and the problem) of independent observations from the ith popula- 
tion. He computes the k sample variances sj and makes the ranking (4). He 
then makes the obvious decision. For Goal I he states that the populations that 
ga\ > rise to the ¢ smallest sample variances are the ‘‘best’”’ populations, and the 
k — t remaining populations are the ‘‘worst”’ populations. For Goal II he states 
that the populations that gave rise to the smallest, second smallest, , th 
smallest sample variances are the “best,” “second best,” ... , “tth best’’ popu- 
lations, respectively, and the k — ¢ remaining populations are the ‘“worst’’ popu- 
lations. The probability of achieving the goal, that is, the probability of a cor- 
rect ranking, depends for each goal on the 6;,;,; (¢ = 1, 2,+--,k — 1) and the 
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n,; (d.f.) associated with the sf (i = 1, 2, --- , k). We shall show, for Goals I 
and II, how to determine the n; so that the experimenter’s requirements will be 
satisfied. 

2.3. Confidence statements associated with the procedure. It is important to point 
out that if one adopts the procedure described above, it is possible for him to 
make useful confidence statements. These are given below without proof. 

For Goal I if the n; are chosen so that the probability of a correct ranking is 
P when 6¢41,¢ = 9%41,, and 0,) = O14. = 1 (see Section 3.2), then after having 
taken the required number of observations, the experimenter can assert with 
confidence coefficient at least P that 


2 2 2 
max lo Db, 2 -** opt * 
(5) 1s OS Ott 
o(4) 


where o{ denotes the variance of the population which yielded s{,(i = 1,2, --- , 
t). Similarly, for Goal II if the n; are chosen so that the probability of a correct 
ranking is P when 04:5 = Ota. (i = 1,2,---,¢) and 241 = 1 (see Section 
3.2), then after having taken the required number of observations, the experi- 
menter can assert with confidence coefficient at least P that 


(6) 1/Giius oto/ots S Obs 


simultaneously for i = 1, 2, --- , t. (Here 6%o = 1.) 
For example, if t = 1 the goals coincide and the confidence statement becomes 


(7) oy / etn = 62, . 


This statement holds with confidence coefficient at least P regardless of the true 
configuration of the population variances. Thus, without knowing whether 
6.,, < 02; or 02, = 62; the experimenter still can assert with confidence coeffi- 
cient at least P that the variance 07, of the population which he chose as having 
the smallest variance is not greater than 62,0{1) . 

It follows from the above that the problem for ¢ = 1 (say) could have been 
formulated in the following equivalent way: ‘““How many observations must I 
take from each population in order that I will be able to assert with confidence 
coefficient at least P that the variance of the population that I choose as having 
the smallest variance is either the smallest one or at least not greater than 
63, times the smallest one?” 


3. The probability of a correct ranking expressed as an integral. 
3.1. Arbitrary configuration of the population variances. The probability of a 
correct ranking can be represented as 


(8) Pr [max{s(y , -** , 8} < min{s(yn,--- , sto }], 


(9) Pr [s(x < 82 S +0* & an < min {8141 cc » 8 }] 


for Goals I and II, respectively. We shall give integral expressions for each of 
these probabilities. 
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We note that (8) can be written as 


t 
> 2 2 2 2 2 : 2 
» Pr [max {8 , °° , 8-9, 84, °°» 8} < 8 < Min {[8y4n, °°, 


(@ = 1,2,---,j-1,j+l,---,0 
(10) => >Pr E < 865) < 8p; 


jt (8 per ea Eee 
> P ce < (ay/2¢4)) Oja Xn ’)) (a= 1, 2, ~~ I= ljt+ Bytes ad 
r 
Xncgy > (M@/Mq)) O8x' p3 (@=t+1,t+2,---,k) 


If for each j the above probability is evaluated for x*,,,, fixed (say at y), and the 
expectation is taken over y, then (10) can be written 2s 


Na) v) os. (22 u)} 5 / 
(1 1) > | ape Pica) (fe Djat “) LAL Pras) ny ipy n( ny) ly 


j=l 


j=l I, 


where f,,,, (y) and F,,.,, (y) are the probability density function (p.d.f.) and 
cumulative distribution function (c.d.f.), respectively, of the Gamma variable 
a 4 

The probability (8) can be evaluated for arbitrary n; and 6;,; (t,7 = 1, 2, --- 
k) using (11). However, we shall be concerned with the case 


(12) my = My =+-- = m = n (say), 


and future probability calculations will be made for this special case. 
The probability (8) also can be evaluated using an alternative expression 
which we give for the special case (12). The expression is 


k—1 nk /2 
(13) > [+ -f T'(kn/2) Tl ui"? !? du, / (T'(n/2)}* (1 + 7 9) 
im t=1 

where the limits of integration for wu; , ue, +--+ , Ue. are (0, 6;,), (0, 9;,2), 
(0, 0;,;- 1), (0, 8;, j+1)) (0, 95,542), mre tg (0, 6; )) (Oj,141, o), (0 j.t42, ©), 
(8;., ©), respectively. The above expression is derived by eubiuinn the 
joint distribution of the k independent s{o , making the transformation u; = 
9,89 /8y (6 = 1,2,°°* J — LIV G +2, °°+,k), Us = NV 8ly/2ota , 
and integrating out u; as a Gamma function. Then renaming the u; to make the 
subscripts consecutive, we obtain (13). 

The probability (9) can be evaluated using different expressions. The expres- 
sion corresponding to (11) will be omitted; the expression corresponding to 
(13) is derived in a similar way as (13) and is given by 


k—I <1 is 
T'(kn/2) TJ uS"-?”? IT (h-D mA TT dy, 


(14) — tat tol 


inwayr + (iT. / ik ue) | 
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where we understand that cows Ua = 1if m > n, and the limits of integration 
for U1, °**, User; Us, , Uea are (A.2, ©), (O23, ©), °**, (Oras, ©); 
(Ort41, ©), (Oreye2, ©), ***, (Ore, ©), respectively. The density functions 
contained in both (13) and (14) are multivariate generalizations of the F-dis- 
tribution. The expressions (13) and (14) can be regarded as the operating charac- 
teristic curves with respect to a correct ranking for the procedures of Goals I 
and II, respectively. 

3.2. Least favorable configuration of the population variances. For both Goal I 
and Goal II we are interested in finding the smallest value of n which will guaran- 
tee the requirements specified in Section 2.2. In order to do this it will be con- 
venient to define a least favorable configuration of the population variances. For 
Goal I this configuration is defined by 


(15) 6:1 = Oia = 1; Or41,. 2 1, 
and for Goal IT it is defined by 


(16) On = 1; Se 21 6 4,2, +--+ 


9 9 
Since the probabilities (8) and (9) obviously are increasing functions of the 


Gis1,4 (@ = 1,2, +--+ , & — 1), we see that in order to guarantee our requirements 
it is sufficient to evaluate these probabilities at 


(17) 644 = Oi 041 = l and Oy 41.¢ = Gas = 6* (say) 
for Goal I, and at 
(18) Oetar = 1 and Ossie = Of. 


for Goal II. The desired value of n then is the smallest integer which will make 
the probabilities, evaluated at these points, equal to or greater than the pre- 
assigned probability specified by the experimenter. 

When (12) and (17) hold, the expressions (11) and (13) simplify considerably 
and we obtain 


(19) t} (Fa(y))"11 — Fa(y/0*)¥ fay) dy 
“0 


2@ « | ~1 
| | ; . 

(20) | ves [ fo oce* | Qnty +++, Upea) duy +> dtp dup +++ dug, 

1/6* J1/e* 0 “0 
respectively, where g,(u: , -*- , Ue) is the same density function as is displayed 
in (13). When (12) and (18) hold, there is no corresponding simplification in 
(14), but @,; is replaced by 1/Orea. @ =t+1,---,k) and 6 «4: is replaced 
by 1/6f41,, @ = 1,2,---,¢— 1). 


4. Evaluation of the probability integrals. When n is even, the integrals (14), 
(19), and (20) can be evaluated in a straightforward manner, and the results 
can be expressed as rational functions of the 6;,;. However, when k 2 3 this 
method of evaluation becomes increasingly tedious as m increases and is ineffi- 
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cient even for small values of n. In some cases the probabilities also can be ex- 
pressed as a finite sum of incomplete Beta functions, and using [12] the compu- 
tations can be simplified in some cases. 

When n is odd the integrations are more involved. For k = 2 and 3 the re- 
sults can be expressed in terms of rational functions and inverse trigonometric 
functions of the 6;,; ; for k & 4 and n odd, no results were obtained. 

For k = 2, the probabilities (13) and (14) coincide and are given by the in- 
complete Beta function J, ,/c146,,,) (n/2, n/2). 

For k = 3 and n even, the probabilities (13) and (14) can be expressed in 
terms of finite sums of incomplete Beta functions. We give here three such sums. 
For Goal I when (15) holds and 6,4;,, = 6, we have for ¢ = 1 and 2, respectively 


27 co41y/cosm(n, n/2)Teja40(n/2, n/2) 
(21) n/2 
= Z b(n/2 —Ii2—1 ~j, 1/2)Tojo+2(n/2 +j,n— D 


j=l 
27 142) (n/2, n)Toja4(n/2, n/2) 
99) 


n/2 
— > b(n/2 — 1;n + 3 — 2, 1/2)T e024 (n + 7 — 1, n/2 —7 + 1) 


j=l 
For Goal II when (16) holds and 63. = @.; = 0, we have 


T42)c140402)(n/2, n)Ioyaze)(n/2, n/2) 


(23) ~0(1 + 6)? ¥ b(n/2 — 1; n +7 — 2,0/01 +8) 


j=l 
+ T o402)/a40409(n + j — 1, n/2 — 7 + 1). 


In the above, the symbol b(x; 7, p) is the binomial probability and is equal to 


Cop (1 — p) 
For k = 3 and n odd, general formulas were obtained for the probabilities 
(13) and (14), but for simplicity we shall give the results only for n = 1 and 3. 
For Goal I when (15) holds and 9.4; = 0, we have 
2 V6 (200 — 1) + Vo + 2)) 
are tan <- ——_—____-__--______- p 
40 — 1 


/ 


2 Vo (200-1) + V0+2]\ , 40-1) 
are tan { ——— = >) + —__—___— 
T 


40 — 1 J 70 + 1)? 


2V' (6 — 1)" + 70 + 8) 
r(0 + 12 + 0) 


») 6 


— are tan = 
x V1 + 26 


8 ie ta em + BE DOT 4 Be + 2) 
rc tan es y+ +) 
7 V1 + 20 TO > 1) + 1 
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For Goal II when (16) holds and 6;,.. = 62; = 6, we have 


(28) “are tan (Vi +1 -— VFO F Al) 


= are tan {V/o [0 +1 — V# +064 1)} 
(29) 


a Se Smear 
w(O + 1)? (@ + 6 + 1)5? 


Because of their simplicity we give in addition two general results for n = 2. 
For Goal I when (15) holds we have 


(30) t! {II (<— + a 


i=l Pets, / 


4 20 (6 — 1) | (9 — ne tet ete) 


while for Goal II when (16) holds we have 


t A —1 
(31) | I > 0.5] ; 
im] jod 
5. Large sample approximation to the probability. In Section 4 we pointed out 
that it is extremely tedious to compute exact probabilities when n is even and 
large, and that when n is odd these difficulties multiply considerably even for 
small n. In this section we shall show how large-sample theory can be used to find 
very good approximations to the required probabilities even for relatively small n. 
We shall illustrate the method using a particular problem. The extension of 
the method to the general problem will be straightforward. Our principal tools 
will be the use of the transformation y = log, s° (see [2]), and the approach of 
certain multivariate distributions to multivariate normal distributions. 
As our particular problem we shall consider Goal I for k = 3, ¢ = 1 when 
(12) holds. Letting 


(32) X; = log. (8°(/o{u) 
we see that we can write the probability of achieving our goal as 
Pr [s3, < 8,80) < 8d] 
(33) 
= Pr [X, — X, > — log, 62:1, X3; — X1 > — log, 63,1). 


Now it can be shown (see [2]) that the expectation and variance are 


(34) B(x) = - (44+ 4,) +00 
n oF 


(35) Var {X,} = & [log, r(2)] 
dx? 


z= n/2 
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E {X; — Xi} = 0 
2 d el 4 , 
(36) Var (X.— Xi} = 2(27 +244) 40% oS 2,3 
n nr di n—1 
Correlation {X, — Xi, X; — Xi} = }. 
Using the method of characteristic functions, it can be shown that the joint 
distribution of the chance variables 


(37) Y;= V(n — 1)/A(Xin — X1) t= 1,2 


approaches the bivariate normal distribution with means zero, variances unity, 
and correlation coefficient plus one-half. Thus the probability is given approxi- 
mately by 


+e +00 1 
3 [ ——, @ ee 2 
( 8) hy/n—1 loge@2,1 %—-ty nl loge63,1 rV/3 “ iy a * va) dy: dys 


This integral is tabulated [11]. When (17) holds the common value of the two 
lower limits is 


(39) —4-/n — 1 log, &*. 


More generally, for Goal I when (12) and (15) hold, the probability (8) can be 
expressed as 


we $$ —$—____— 
40) tf (Ga) — Gay — VE =) loge 6141.01 “gn() dy 


where g,{y) and G,(y) are the common p.d.f. and c.d.f., respectively, of the 
chance variables 1/}(n — 1)X; (i = 1, 2, 3). Since gn(y) and G,(y) approach the 
p.d.f. f(y) and c.d.f. F(y) of the standardized normal chance variable, it follows 
that the expression (40) approaches 


(41) tf Fait - Fy - a 4@) dy 


where d = +/}(n — 1) log. 6:41... A tabulation‘ [9] of (41) has been made as a 
function of d for certain pairs (t, k). These tables therefore can be used to find 
an approximation to the probability (40). The reader should note that (41) 
can also be written as 


(42) 9 [ry + ar — FO YW) dy 


and it is this expression which appears in [9]. 


‘ These tables were computed by the National Bureau of Standards at the Institute for 
Numerical Analysis, Los Angeles. They are the basic tables from which Table I in [3] was 
derived. 
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6. Tables. Tables of the probability of a correct ranking have been prepared 
to assist the experimenter in designing experiments for ranking variances. All 
of the tables are computed for the case when n is the same for each population, 
and the least favorable configuration of the population variances holds, that is, 
(15) holds for Goal I and (16) holds for Goal II. The following key describes the 


tables. 


Value of 
Table Number | Goal Number Conditions on the 6;,; 


I I or IT : Oo, 
I] I or II ‘ 63, 
III J ‘ d 63, 
LIV II ‘ , 63 
V [ 44, 


All of the tables give the probabilities for n = 1(1)20 and @ = 1.0(0.2)2.2. 

Two probabilities are given in each cell of the tables (except for some of the 
cells in Table V). The correct probability, P.(@, n), is given to five decimal 
places; the normal approximation, P,(6@, n), (see Section 5) to the correct prob- 
ability, is given to four decimal places. The purpose of giving P,(@, n) is to indi- 
cate the magnitude of the error of the approximation, and to show for various 
goals, k, and ¢ how this error varies as a function of @ and n. The magnitude of 
the error cannot be judged for most of the P,(@, n) in Table V since the P.(6, n) 
are given only for n = 2(2)12. Formulae had been developed for the computa- 
tion of these P.(@, n) for n even, but such computations were found to be too 
laborious for n > 12; no similar formulae had been developed for n odd. 

If we let D (0,n) = P.(0,n) — P,(6, n), then the following properties would 
appear to hold for all of the tables: 1) For any fixed @, lim,.. D(@,n) = 0; 2 
For all n, D(1, n) = 0; 3) For any fixed n, D(6, n) is continuous in @; and 4) 
For any fixed n > 1, limp.« D(0,n) = 0. 

Based on the behavior of P.(6, n) and P,(6, n) in the range computed, the 
following additional properties would appear to hold: 

(a) For Tables I, III, and IV: 1) For all n and @ > 1, D(@, n) > 0; 2) For 
any fixed n > 1 there exists a value 6°, of @ such that D(@, n) is strictly increas- 
ing for 1 < @ < 6°, and strictly decreasing for @ > 6°, while forn = 1, 0f =~; 
3) 0° is strictly decreasing with n; 4) For any fixed @ > 1, D(@, n) is strictly 
decreasing with n; and 5) Maxs,,D(6, n) is strictly decreasing with n. 

(b) For Tables II and V: 1) For @ > 1, D(@, 1) > 0; 2) For any fixed 6 > | 
there exists a value ny of n such that D(@, n) is strictly decreasing for n < nj 
(D(6, ns) < 0) and strictly increasing for n > ne; and 3) n} is strictly decreasing 
with 6. 

The normal approximation is the same for Tables II and III. In general, for 
fixed k the normal approximation to the probability of achieving Goal I will be 





TABLE I 


Probability* of a correct ranking as a function of the true variance ratio @ and the number of 
degrees of freedom (n) from each population: 


2 2 . eg ig @ 
Pisa) < 8@)). True variance ratio: o12;/o11,; = 8 
Degrees of e 

freedom — 


(#) 1,0 1.2 4 1.6 1.8 2.0 2.2 


. 50000 . 52898 0.55330 0.57412 0.59223 0.60817 .62236 
5000 0.5000 0.5000 0.5000 0.5000 0.5000 .5000 


50000 54545 0.58333 0.61538 0.64286 0.66667 .68750 
5000 5363 0.5668 0.5929 0.6156 0.6355 .6533 


50000 


1 


oo 


.55779 0.60561 0.64560 
.5513 0.5940 0.6302 


. 56799 0.62384 0.67000 
.5627 0.6146 0.6580 


~ 
_ 


.67938 0.70821 . 73301 
.6612 0.6880 7114 


coo oo oo co 
oo 


. 70845 0.74074 . 76807 
.6946 0.7258 .7526 

.57685 0.63951 0.69071 : 
.5723 0.6317 0.6808 


3274 0.76749 . 79641 
217 0.7559 .7848 
.58476 0.65338 0.70879 
. 5808 0.6466 0.7004 


. 75364 0.79012 . 81999 
7445 0.7808 .8110 


59196 : 0.72488 
5883 65% -7176 


.77195 0.80964 83999 
. 7642 0.8020 . 8329 


co co oOo oo 


. 59861 677% 73939 
5953 6718 -7329 


.60481 ‘ . 75260 
.6017 . 6824 7469 


78822 0.82670 85718 
7816 0.8204 8515 


ooo oo 9co9°f SS Se o 


80281 0.84176 87210 
7971 0.8365 .8676 


— 


.61064 7 5 -76473 
.6078 . 6 7596 


= 
> 


.81600 0.85515 88515 
.8110 0.8508 8815 


.61614 ; é 77594 .82800 0.86714 . 89662 
.6134 . 70% .7713 . 8237 0.8635 .8937 


oo 
= 


.62137 ° . 78633 83897 0.87791 .90677 
.6188 : 5 7821 8352 0.8748 -9045 


62635 12 . 79602 84903 88765 .91578 
.6239 0.7: . 7922 8457 . 8850 0.9140 


“4 6 | «CO. 63112 0.73136 80508 85830 (89646 0.92381 
5000 ‘6288 «0.7274 0.8016 8553 "9943 0.9224 


15 .50000 .63570 0.73870 0.81357 86686 .90447 0.93098 
0.5000 0.6335 0.7355 0.8104 0.8643 0.9026 0.9299 


16 | 0.50000 0.64011 0.74569 0.$2155 0.87479 0.91177 0.93741 
0.5000 0.6380 0.7427 0.8186 0.8725 0.9102 0.9366 


17 0.50000 0.64436 0.75237 0.82908 0.88214 0.91843 0.94317 
0.5000 0.6423 0.7495 0.8264 0.8801 0.9172 0.9426 


18 0.50000 0.64846 0.75875 0.83618 0.88898 0.92452 0.94836 
0.5000 0.6465 0.7561 0.8337 0.8872 0.9235 0.9480 

19 50000 0.65243 0.76487 0.84290 0.89535 0.93011 0.95303 

5000 0.6505 0.7623 0.8406 0.8938 0.9293 0.9528 


0 
0 

20 | 0.50000 0.65629 0.77075 0.84926 0.90129 0.93523 0.95725 
0.5000 0.6544 0.7683 0.8472 0.8999 0.9346 0.9571 


* The five- and four-decimal place numbers in the body of the table are the correct prob- 
abilities and normal approximations to the correct probabilities, respectively. 
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TABLE II 
Probability* of a correct ranking as a function of the true variance ratio @ and the number of 


degrees of freedom (n) from each population: 


2 : 2 2 ; Sane at 2 2 2 
P{s?,, < min (s7,,, 8 ,))]- True variance ratio: i ay/7 11) = 1 a)/7 11 = @ 


Degrees of | 
freedom 
(nm) 


1.0 ; ‘ A 1.8 2.0 2.2 


1 0.33333 0.35835 0.38020 . 0.41696 0.43269 44705 
0.3333 0.3333 0.3333 333% 0.3333 0.3333 . 3333 


0.33333 0.37500 0.41176 ‘ 0.47368 0.50000 .52381 
0.3333 -3704 4027 0.4567 0.4795 5003 


0.33333 
0.3333 


0.33333 
0.3333 


0.33333 
0.3333 


0.33333 
0.3333 


0.33333 
.3333 


38792 
3861 


39880 
3983 


40838 
4087 


41703 
4179 


-43633 : 0.51746 0.55149 .58192 
-4325 AT 0.5096 0.5420 .5710 


45702 ’ 0.55379 0.59375 .62901 
4556 f 0.5501 0.5892 .6237 


47519 53: 0.58515 0.62976 66858 
4752 . 533% 0.5839 0.6280 -6664 


49156 5566: 0.61284 0.66113 . 70254 
4925 . 557% 0.6131 0.6611 7023 


50654 577% 0.63766 0.68886 .73209 
5081 . 0.6389 0.6900 .7331 


42498 
4262 


oo coo oc oo Ooo © 


33333 
3333 


43238 52042 . i 0.66016 0.71362 . 75807 
4339 5225 . 598: 0.6621 0.7155 . 7599 


33333 43933 53339 61: 0.68070 0.73590 . 78106 
. 3333 4411 5358 ‘ 0.6832 0.7383 . 7835 


33333 44590 54559 6299! 0.69956 0.75605 80153 
3333 4478 5483 ‘ 0.7024 0.7588 8043 


.33333 -45209 .55713 645: 0.71697 0.77436 -81982 
3333 -4542 5600 - 6 0.7200 0.7774 .8228 


33333 45813 . 56808 659% 0.73310 0.79106 0.83622 
. 3030 4603 5712 ‘ 0.7363 0.7942 0.8393 


33333 46386 57852 . 67: 0.74809, 0.80633 


85097 
3333 4662 .5818 ‘ 0.7514 0.8096 


8541 


3333 4718 0.5918 ‘ 0.7655 0.8236 8674 


0 
0 
33333 -46937 .58849 - 68% 0.76205 0.82033 0.86427 
0 
0 


. 33333 47469 0.59805 : 0.77508 0.83319 .87628 
.3333 AT72 0.6015 ' 0.7785 0.8365 0.8794 


33333 -47983 0.60723 - 709% 0.78727 0.84503 0.88715 
. 3333 4824 0.6107 ‘ 0.7907 0.8483 0.8902 





33333 .48481 0.61606 ‘ 0.79869 0.85594 0.89699 
3333 4874 0.6196 . 7238 0.8022 0.8592 0.9000 


33333 -48965 0.62456 Te 0.80940 0.86601 0.90592 
3333 -4923 0.62°2 -T3A 0.8128 0.8692 0.9088 


33333 49434 0.63277 0.81947 0.87532 0.91403 
3333 4971 0.6364 0.7442 0.8229 0.8784 0.9168 


20 33333 .49892 0.64069 0.74994 0.82893 0.88393 0.92140 
0.3333 0.5017 0.6444 0.7536 0.8323 0.8869 0.9241 





* The five- and four-decimal place numbers in the body of the table are the correct prob- 
abilities and normal approximations to the correct probabilities, respectively. 
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TABLE III 
Probability* of a correct ranking as a function of the true variance ratio @ and the number of 
degrees of freedom (n) from each population: 


> 2 2 2 . — _— 
P(max (8%) » 85) < 87). True variance ratio: is)/F 11) 


2 —- 


Fisi/7 19) 





Degrees of 9 
freedom 
(n 


1.6 1.8 2.0 2.2 


1 0.36729 .39650 .42200 0.44450 -46456 .48258 
| 0.3333 0.3333 3333 3333 0.3333 3333 3333 


33333 0.38503 .42982 0.46886 0.50311 0.53333 . 56019 
.3333 0.3704 4027 .4312 0.4567 0.4795 . 5003 


33333 0.39827 45473 . 50372 0.54633 0.58350 .61609 
3333 0.3861 4325 4734 0.5096 0.5420 .5710 


33333 0.40927 47539 .53244 
.3333 0.3983 4556 0.5059 


.58156 0.62388 66046 
5501 0.5892 .6237 


oo 


33333 0.41889 .49340 55726 
.3333 -4087 4752 5333 


-61165 0.65790 -69730 
5839 0.6280 -6664 


. 63803 0.68732 - 72867 
.6131 0.6611 - 7023 


33333 -42755 . 50954 57929 
3333 -4179 .4925 5572 


33333 .43549 52426 0.59920 
3333 .4262 5081 5787 


-66157 0.71318 - 75583 
6389 0.6900 -7331 


-68282 0.73618 - 77960 
6621 0.7155 . 7599 


33333 -44286 . 53787 -61741 
3333 .4339 5225 5982 


33333 -44978 55056 -63421 
3333 4411 0.5358 -6162 


. 70216 0.75681 .80058 
6832 0.7383 7835 


33333 0.45632 0.56247 64981 0.71988 0.77542 .81920 
.3333 4478 0.5483 .6328 0.7024 0.7588 8043 


oo oo oc Soo oo 


11 0.33333 -46253 0.57372 0.66438 0.73620 0.79229 83582 
0.3333 -4542 0.5600 0.6483 0.7200 0.7774 .8228 


12 0.33333 .46846 0.58438 0.67804 0.75130 0.80766 -85071 
0.3333 .4603 0.5712 0.6628 0.7363 0.7942 .8393 


13 0.33333 -47414 0.59453 0.69089 0.76530 0.82170 86409 
0.3333 4662 0.5818 0.6765 0.7514 0.8096 8541 


14 0.33333 0.47960 0.60423 0.70302 0.77832 0.83455 87614 
0.3333 4718 0.5918 0.6893 0.7655 0.8236 0.8674 


15 0.33333 -48487 0.61350 0.71449 0.79047 0.84636 0.88702 
0.3333 4772 0.6015 0.7015 0.7785 0.8365 0.8794 


16 0.33333 . 48996 0.62241 0.72537 0.80182 0.85721 0.89686 
0.3333 0.4824 0.6107 0.7130 0.7907 0.8483 0.8902 


17 0.33333 0.49490 0.63096 0.73570 0.81245 0.86721 0.90578 
0.3333 0.4874 0.6196 0.7239 0.8022 0.8592 0.9000 


18 0.33333 0.49968 0.63919 0.74553 0.82241 0.87644 0.91388 
0.3333 0.4923 0.6282 0.7343 0.8128 0.8692 0.9088 


19 0.33333 0.50433 0.64713 0.75489 0.83176 0.88496 0.92123 
0.3333 0.4971 0.6364 0.7442 0.8229 0.8784 0.9168 


20 0.33333 0.50885 0.65480 0.76382 0.84055 0.89285 0.92791 
0.3333 0.5017 0.6444 0.7536 0.8323 0.8869 0.9241 


* The five- and four-decimal place numbers in the body of the table are the correct prob- 
abilities and normal approximations to the correct probabilities, respectively. 
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TABLE IV 
Probability* of a correct ranking as a function of the true variance ratio 6 and the number of 
degrees of freedom (n) from each population: 


P{s? < 8°.) < stay) True variance ratio: oF 9) /F 4) = a )/F i) = 0 


Degrees of 6 
freedom 
n 


1.0 1.2 1.4 1.6 1.8 2.0 2.2 


1 . 16667 19716 . 22510 


. 25067 . 27412 - 29567 .31554 


. 16667 . 21578 . 26223 30531 34484 


38095 .41387 
- 1667 . 2053 - 2412 . 2744 3051 


3304 3597 


- 16667 . 23033 . 29168 . 34869 .40061 44740 .48934 
. 1667 . 2225 .2759 8257 .3716 4136 4519 


0 

. 1667 . 1667 . 1667 . 1667 . 1667 0.1667 . 1667 
0 
0 


. 16667 . 24273 31696 . 38572 44763 50251 .55078 
. 1667 . 2362 . 3037 . 3668 0.4244 -4762 5227 


. 16667 . 25379 33951 -41846 0.48857 . 54962 .60229 
. 1667 . 2480 .3279 -4022 0.4691 . 5284 0.5804 


os 
~~ 


. 16667 - 26388 . 36009 -44799 0.52488 . 59063 0.64624 
. 1667 . 2587 3497 -4337 0.5083 . 5732 0.6289 


16667 27325 0.37912 47495 0.55749 .62675 0.68416 
. 1667 . 2684 0.3696 4622 0.5432 -6122 0.6703 


. 16667 . 28206 0.39693 -49984 0.58703 .65883 0.71719 
1667 .2775 0.3882 4884 0.5746 6466 0.7061 


. 16667 . 29040 0.41368 . 52292 0.61394 . 68750 0.74613 
. 1667 . 2861 0.4055 .5125 0.6030 .6772 0.7372 


— i — i — — 


. 16667 . 29834 0.42953 0.54443 0.63857 0.71323 0.77162 
. 1667 . 2943 0.4219 0.5350 0.6290 7046 0.7644 


o oo 


. 16667 . 30594 0.44459 56456 0.66121 . 73645 0.79417 
. 1667 0.3021 0.4374 5559 0.6528 .7291 0.7884 


So 


0.16667 0.31326 0.45896 . 58347 0.68209 . T5745 0.81419 
0.1667 0.3096 0.4522 5756 0.6746 .7513 0.8096 





0.16667 0.32031 0.47269 .60127 0.70139 0.77650 0.83201 
0.1667 0.3168 0.4663 .5940 0.6948 7713 0.8284 


0.16667 0.32714 0.48585 .61806 0.71927 .79383 0.84794 
0.1667 0.3237 0.4798 .6114 0.7134 0.7895 0.8451 


0.16667 0.33375 0.49849 0.63393 0.73587 0.80963 0.86219 
0. 1667 0.3304 0.4928 0.6278 0.7307 0.8060 0.8600 


0.16667 0.34018 0.51065 0.64895 0.75131 0.82405 0.87497 
0.1667 0.3370 0.5052 0.6433 0.7467 0.8210 0.8733 


0.16667 0.34643 0.52236 0.66319 0.76569 0.83725 0.88646 
0.1667 0.3433 0.5172 0.6580 0.7616 0.8347 0.8853 


0. 16667 0.35252 0.53365 0.67671 0.77% 0.84934 0.89680 
0. 1667 0.3495 0.5287 0.6719 0.778 0.8473 0.8960 


19 0. 16667 0.35847 0.54454 0.68955 0.7916% 0.86044 0.90612 
0. 1667 0.3555 0.5398 0.6851 0.788% 0.8588 0.9056 


20 0. 16667 0.36428 0.55508 0.70177 0.80334 0.87064 0.91454 
0.1667 0.3614 0.5506 0.6976 0.8006 0.8693 0.9143 





* The five- and four-decimal place numbers in the body of the table are the correct prob- 
abilities and normal approximations to the correct probabilities, respectively 
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TABLE V 


Probability* of a correct ranking as a function of the true variance ratio @ and the number of 
degrees of freedom (n) from each population: 


P{s?,, < min (s?,, , s? 


si]. True variance ratio: o},)/e%, = g? 


3)’ } 


Degrees of 
freedom 


. 2500 ‘ ; . 2500 2500 
.31818 34783 } 40000 .42308 
.3149 3424 ‘ . 3903 4112 
. 3437 3841 .4207 4541 4845 
.36195 .41153 ° 49792 . 53523 
3664 .4169 0. . 5039 5411 


. 3859 .4450 . 5458 . 5882 


.46150 .52010 . 57221 .61848 
.4700 . 5296 . 5822 A285 
-4927 .5577 0.6145 .6638 


.50370 .57256 0.63244 68395 
.5136 . 5834 0.6435 .6949 


. 5330 0.6069 0.6698 0.7: 


. 25000 . ‘ . 54060 0.61741 0.68247 0.7% 
- 2500 0. : 5511 0.6286 0.6936 0. 


. 25000 
. 2500 ‘ ‘ 5681 0.6487 


. 25000 ’ . ; 0.65643 72478 77999 
. 2500 E ' ; 0.6675 . 735% 7901 


- 25000 
. 2500 . Af 5994 0.6850 . 7538 . 8083 


. 25000 

. 2500 . 3828 : .6138 0.7015 .7709 8249 
. 25000 

. 2500 . 3879 i 6276 0.7169 . 7866 


. 25000 
. 2500 3931 5! .6407 .7314 8011 





17 . 25000 
. 2500 . 3982 5367 . 6532 .7450 .8146 


18 . 25000 
. 2500 .4031 46 . 6651 .7579 .8270 8773 


19 . 25000 
0.2500 .4079 of .6766 .7700 8386 8876 


20 0.25000 
0.2500 0.4126 0.5637 0.6875 0.7815 0.8494 0.8970 
* The five- and four-decimal place numbers in the body of the table are the correct prob 
abilities and normal approximations to the correct probabilities, respectively 
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the same for choosing the ¢ smallest as for choosing the / largest (i.e., the k — ¢ 
smallest). The following relationships hold for the entries in Tables II and III: 
1) For all n and @ > 1, P.(6, n| II) < P.(0, n| IIT) and 2) For sufficiently 
large n and @ > 1, P.(6,n| Il) < P,(0,n) < P.(6, n| III). 

It should be noted that P,(6, n) is very close to P.(@, n) throughout the range 
of the various tables, and that therefore the normal approximation could be 
used with very good results to fill out Tables I to V for n > 20. The approxima- 
tion also could be used (together with [9] or tables in [3]) for the construction 
of additional tables (k = 4) for which exact formulae are unavailable or avail- 
able but unwieldy. 

All of the P.(@, n) should be correct to the five decimal places which are given. 
For Table I exact formulae were used in preference to interpolating in the in- 
complete Beta function tables. Yor Tables II, III and IV exact formulae were 
obtained for n = 1 (1) 8 (2) 20, and probabilities were computed to 8 decimal 
places; for n = 9 (2) 19 the probabilities were computed by interpolation on the 
values for even n using Everett’s interpolation formula. For Table V exact for- 
mulae were obtained for n = 2 (2) 12. 

All of the P,(6, n) were computed by setting Var {X;} equal to 2/(n — 1), 
(see equation (35)). The univariate normal, bivariate normal, and trivariate 
normal probabilities were found by interpolating in [8], [4], and [9], respectively. 
An empirically noted fact which was not only interesting, but also extremely 
useful from the viewpoint of checking the tables, was that for given @ the first 
differences of the probability as a function of n were strictly decreasing, and all 
of the higher differences were strictly increasing. 


7. Example. The following is an example to show how the tables are to be used. 
The model of Section 2.1 is assumed throughout. 

Given three populations. Suppose that it is desired to find which population 
has the smallest variance, and to guarantee that the probability of correctly 
choosing that population will be at least a) 0.60, b) 0.90, when o;3;/o;i; = 1.8. 
How many observations must be taken from each population? (The information 
in the tables is given in terms of d.f. The conversion of number of d.f. to number 
of observations will depend on the nature of the problem at hand.) 

a) Refer to Table II. We see that 6 d.f. from each population will meet the 
requirements. 

b) Refer to Table II. We see that 20 d.f. from each population is too small to 
meet the requirements. To estimate the correct number we proceed as follows. 
We compute V}(n — 1) log, 621 = V h(n — 1) log, 1.8 and set it equal to 2.2302. 
(The number 2.2302 is obtained from [3], Table I, column headed k = 3,/ = 1 
opposite P = 0.90). Solving for n we find that 30 d.f. from each population will 
meet the requirements. 


In terms of the problems considered in this paper, the quantities given in the 
body of Tables I and IT of [3] are 1/3(n — 1) logs 4:41, and V iin — 1) log, 
i+1,4(@ = 1, 2), respectively. These tables can be used for Goal I (10 2 k 2 
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2) and Goal II (k = 3), respectively. No corresponding tables exist for Goal II 
(k 2 4). 
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SOME THEOREMS ON QUADRATIC FORMS APPLIED IN THE STUDY 
OF ANALYSIS OF VARIANCE PROBLEMS, I. EFFECT OF INEQUALITY 
OF VARIANCE IN THE ONE-WAY CLASSIFICATION 


By G. E. P. Box 


Imperial Chemical Indusiries, Manchester, England and North Carolina State 
College 


1. Summary and introduction. This is the first of two papers describing a study 
of the effect of departures from assumptions, other than normality, on the null- 
distribution of the /-statistic in the analysis of variance. In this paper, certain 
theorems required in the study and concerning the distribution of quadratic 
forms in multi-normally distributed variables are first enunciated and simple 
approximations tested numerically. The results are then applied to determine the 
effect of group-to-group inequality of variance in the one-way classification. It 
appears that if the groups are equal, moderate inequality of variance does not 
seriously affect the test. However, with unequal groups, much larger discrepancies 
appear. In a second paper, similar methods are used to determine the effect of 
inequality of variance and serial correlation between errors in the two-way 
classification. 


2. Distribution of quadratic forms in multi-normal variates. In what follows 
we write x (v) to denote a quantity distributed as x° with v degrees of freedom 
and F(», , v2) to denote a quantity distributed as the Fisher-Snedecor F with 
v; and v, degrees of freedom. 

By obvious extension of a theorem due to Cochran [1] we have 

THEOREM 2.1. If z denotes a column vector of p rardom variables 2, , 22, -*- , Zp 
having expectation zero and distributed in a multi-normal distribution with p X p 
variance-covariance matrix V, and if Q = 2'Mz in any real quadratic form of rank 
rs p, then Q is distributed like a quantity 


(2.1) X = Da;x*(1) 
j=l 


where each x variate is distributed independently of every other and the d’s are the 
r real nonzero latent roots of the matrix 


(2.2) U = VM. 


It readily follows that 
THeoreM 2.2. The sth cumulant K,(Q) is given by 


(2.3) K,(Q) = 2°-"s — 1)! } aj. 


jenl 


Received 6/15/53. 





THEOREMS ON QUADRATIC FORMS 


Calculation of this quantity is often facilitated by using the relation 


Pp 


r Pp p 

(2.4) 7 4j = tr(VM)* = > > tee > Uab Ube *** Usa 
j=l a=1 bel son 

whence the first few cumulants of Q may readily be derived without actually 

determining the )’s. In particular 


p 


Ki(Q) = >> tas 


awl 


p Pp 
(2.6) Kx(Q) = 220 Do was tea. 
a=] b=] 
When the }, are all positive, the following x’ series due to Robbins and Pit- 
man [2] may be used to obtain the distribution of X = S71 A,x°(v,). 
Tueorem 2.3. If Xo is some constant greater than zero then 


(2.7) P, < Pr {X a Xo} s P, 
where 


n . A " - 
(2.8)P, = > ¢,Pr: x vr 21) > —?>+ (1 > Yer) Pry x°v + Jn 
lead 


ay 


(== () 


" 


(2.9) P, = p> Cr Pr ¢ x (v + 21) > Xo of (1 — ) id c:) 


7) Qy | Lan) 


and the constants c,; are such that ym c, = | and are defined by the identity 


(2.10) I] fap"? 0) — G — ao”) = Sew 


j=2 l= 
a, = d, being the smallest of the \; , a; = ;j/M. Gj # 1), and > jant (v;) = v. 
If the component degrees of freedom »; are even, a finite x’ series, derived 
below, may be used whether the \; are positive or not. 
THEOREM 2.4. The exact distribution of X = °521d;x'(v;), where the vj = 2g; 
are even integers, is a weighted finite sum of x’ distributions, 


(2.11) Pr (X > Xo) = > aj, Pr {x'(28) > Xo/dj} 


j=l s=l 


and a;, is a constant involving only the d’s and is given by 
(2.12) aj, = f° 0)/(9; — 8)! 


where 0) is obtained by differentiating f(y) h times with respect to y and then 
pulting y = 0 and 


(2.13) ty) = II jee + v5" oe 
7 


tj LY 


In the special case r = 2, a series of this type has been used by Satterthwaite 
[3]. The general theorem is conveniently proved as follows. 
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Proor. Since g; = v;/2 is an integer, the characteristic function 
(2.14) g(t) = I (1 — 2itn,)~* 
can be resolved into partial fractions 
(2.15) Il (1 — 2i;)""! = > > ajs(1 — 2itd,)~* 


j=l j=l sl 


where the a;, are constants not containing /. We recognise this expression as the 
characteristic function of a quantity v whose probability density function is 


(2.16) piv) = > aj, DLA; x’(2s)}. 


Hence X is distributed like v and equation (2.11) follows at once. 
To find the values for the constants, (2.15) is written in the form 


(1 — 2itr,)~** Il (1 — 2:7! = 2 aul! — 2itr,) 


2.17) 


+ > - Ojw(1 — 2itr;)”. 


ji wel 
Putting y = 1 — 2i\; and multiplying both sides of the identity by 1‘ we have 
r fe 7"! gi 
i j j — gi-s 
II 4 “h +y rf = 2 Ais Y 


jet \ 


(2.18) 


7, fi ces B —~ 
+ a ae aie) A + ys y’ 


ji wel 


If y is put equal to 0 we have 


(2.19) Qigg - Ti 


igi 


eid 


To obtain the remaining constants we differentiate both sides of identity h times 
and then put y = 0. There will be no contribution from the second member on 
the right-hand side of (2.18) and the term 7°95; auy’** will contribute hlaig,—n - 
Thus 


(2.20) Qig.-h = fi ve where f(y) = nee yy 


In practice the constants can be most easily found as follows. 


(2.21) fi(y) = 11 \x— m mY reseieh 


jpi je Ni — ij) 


\" , r . v \ 
Fe Zovtoe[t + a 
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Since ¢ can always be chosen so that | y| < 1, we can expand each side of the 
equation and equate coefficients. 


¥ a ae es ‘ 


h=O 
T1{ +)" exp{ ‘} 
= ex 
Thy, uN Pe Lol x =| 
The relation between a,,,-, and the coefficient of y* on the right-hand side of 
(2.23) is the same as that between the hth moment about the origin yu, and the 
hth cumulant K, . The well known equalities expressing the moments in terms 


of the cumulants may be used, therefore, in calculating the coefficients required 
above. For if we write 


2.23) 


r 


Cn 5 aN 
(2.24) Ka = th 11 {0 = |} 


jet 4m; ) 


then 


(2.25) oC 


, \"4 
2.26 Qig; = 
( ‘ Ti 


3. Investigation of the accuracy of a simple approximation to the distribution 
of a nonnegative quadratic form. Welch [4], [5] and Fairfield Smith [6] have repre- 
sented the distribution of a particular nonnegative quadratic form, when r = 2, 
by that of Z = gx’(h), the constants g and h being chosen so that the distribution 
has the same first two moments as Q. Satterthwa.ie [3] has suggested its use in 
the general case (r 2 2) when we have 

THEOREM 3.1. 


Q=27Mz = Re Aj x’ (v5) 


j=l 


is distributed approximately as gx’ (h) where 


: oI 2 > \2 
(3.1) ,Kx(Q) _ Dw og 4 am 2EK QI? — CO as)* 


9-25 > vj); K2(Q) > wry 


It should be noted that the effective degrees of freedom, h, are necessarily 
less than the number appropriate if the \, were all equal. For if 


Pe" *ePie.t? * 2 es Ray °° Asn °°? Aes 


and uw are any positive real numbers, then 


r 


(3.2) > wildy — uw)’ = 0 


j=mi 


and if uw is the weighted mean of the X’s, that is if 
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TABLE 1 


Comparison of Approximate and Exact Distributions of Some Quadratic Forms 


Valnes of hi Degrees of freedom Exact probability (7%) of exceeding approx. 100%, significance 
J 7 point 


“a ve 100a% = 5.00 10.00 25.00 50.00 75.00 95.00 


i 
i 


24.56 49.74 | 75.07 | 95.96 
24. 49.41 | 75.27 | 95.69 
23. 49.95 75.63 | 96.54 
22. 47.61 | 77.25 | 98.16 
23. 48.68 76.95 

24. 49.06 | 75.16 | 98. 
24. 49.32 | 75.14 


te a 


5.04 
5.08 


aeserast 


to & tS Ww Ww lO 


then (3.2) is equivalent to 


r r 


(3.4) > wal — (Sas) > »; 2 0, 


j=l j=l 


that is 


(3.5) (> v/s) > jr} s > »;. 

j=l j=l j=l 
Although an approximation of this kind has often been used (see for example 
Patnaik [7}), investigations of its accuracy seem limited to the case k = 2 studied 
by Satterthwaite [3]. 

Table 1 shows the exact probabilities (calculated from the finite series of 
Theorem 2.4) of exceeding the significance points obtained from the approxima- 
tion for a number of particular quadratic forms. This brief investigation sug- 
gests that the simple approximation is fairly good over a wide range of values of 
v and \. However, when small differences in probability were to be examined, 
it would be necessary to apply the method with caution and make checks by the 
exact methods. 

4. Distribution of the ratio of independently distributed quadratic forms in 
multi-normal varieties. By canonical reduction of numerator and denominator, 
the ratio Y = Q,/Q, is seen to be distributed like the quantity 


e ’ » \ 

‘ , : / 2 / } 2 \ 
(4.1) X1/Xe = 4 xe xO} /VD ws}. 
\j'=l j=l : 
By representing numerator and denominator by infinite x’ series, Pitman and 
Robbins '2] have obtained the distribution of Y, when the X’s are all positive, 
as an infinite series in which each term contains a probability calculated from 
the F distribution (or more conveniently from the incomplete B-function). In 
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our application it will always be possible to arrange that component x’’s (at 
least of the denominator) have even degrees of freedom. 

Employing the Robbins-Pitman [2] infinite series in the numerator and the 
finite series in the denominator, we readily obtain Theorem 4.1 (for example by 
using Cramer’s theorem [8] concerning the characteristic function of a ratio). 

THEorEM 4.1. If the d’’s of the numerator are all positive and if \; = a’ is the 
smallest of the \’’s and iret vy = v’, then 


(4.2) Pi, S P,(Y > Yo) SP; 


where 


P, Zz = 7 aja T2,(8, by’ + l')} 
(4.3) ; 


, 
n 


+ { ve > b> S anilads, hy’ + n! + 1)}, 


I’==0 j=l sl 


Lan jool gud oe) 


(44) P2= > aj {T2j(s, gv’ + UY} STAM » ey. 


The c’’s are obtained from (2.10), the aj, from (2.24), (2.25) and (2.26), I.(pq) is 
the incomplete Beta function integral, and x; = {1 + (n;/a1) Yo} ., 

If both numerator and denominator of (4.1) have even degrees we may use 
the finite series in both numerator and denominator and obtain 

Turorem 4.2. If v; = 29; and vj}, = 293" are even, then 


(4.5) Pr(¥ SY) = DD LY aj allay (s, 8”)} 


jal aml jor] som 


where the aj: and a;, are obtained from (2.24), (2.25) and (2.26), and aj, = 
{1 + rs¥o/dy}. 

Alternatively, if the forms are nonnegative it is usually simpler to use the 
following. 

Turorem 4.3. If \;, 2, «+: , Ay and X, Az, «++ , Ay are all positive and the 
v; and v; are all even then 


ad (or r’ 9s 
(4.6) Pr f 2% dj vowh/{ 2 dj x',)$ > re| = a 2, is 
where the aj, (i = 1, 2, --+ 7’; 8 = 1, «++ g;) are constants calculated from (2.24), 
(2.25) and (2.26) for the form zi tix'(vi) in which f, = Mi, v2 @ As, °°? 
fy = hy feu = — Yor ora. = — Yor, -*° Sr+r = — Yor, . 
Proor. Since the quadratic forms are nonnegative, the left-hand side of (4.6) 
may be written 


(4.7) P = Pr . + Aye x" (v5) _ > Yod; x°(v,) $ > 0| 


j’=1 j=l 


, 


which is of the form 
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TABLE 2 


Comparison of Approximate and Ezact Distributions of Ratios of Quadratic Forms 


Numerator 


Degrees of 


r , 
(resdoms Values of d’s 


Degrees of freedom 


Denominator ; : cna 

Exact probability (%) of 

exceeding approx. 100a% 
Significance point 


Values of d’s 


100a% | 


50,00 


| 
he | As | Aa | Ae | 10,00} 25.00 


= 5.00 | 


| 

1 .71 |9.63/24.60/49.96 
24 |9.15/24.36|50.00 
.12)24.19]49.93 


1 
1 88 \9.76)24.59/49.91 
| 
3 


55/24 .31/49.80 


Nw hw NN WN & 
> & DO dO dO 


to 


9.95\24.86150.05 


(4.8) pr| f:x'(v,) > 0}. 


tom] 
Using (2.11) 


eS ae 


(49) P= >> aw Pr{x%(2s) >0O} + & = ajy Pr {x*(2w) < 0}. 
j= 1 


t=] gen] r+1 wm 


But 


(4.10) Pr {x’(2s) > 0} = 1 and Pr {x*(2w) < 0} = 0. 


Therefore, 


eae 


(4.11) P=> do aw. 
tml gen] 
It will be noted that when this series is applicable, the required probability 
may be obtained directly without the rather tedious interpolation in the B-func- 
tion tables required by the method of Theorem 4.2. 


5. Use of Theorem 4.3 with quadratic forms that are not independent. This 
method may be used in suitable cases even if the quadratic forms Q; and Q, 
are not distributed independently. For if Q,; and Q. are nonnegative, we may 
write 


P = Ps {Q:/Q: > ¢} = Pr{Q: — ¢Q. > 0} = Pr (2’Miz — ¢2’M2z) > 0 


5.1) x 2 : \ 
(5.1) = Pr {z’Mz > 0} = Pry Det xed) + Dts xe) > O> 

\ tom j=l 
where {; (i = 1, --- , 7’) isa positive latent root, repeated v; times, of the matrix 
M = M, — Mzand ¢; (j = 1, -- , r) is a negative latent root of the same matrix 
repeated v; times. In certain investigations (for example in the study of the two- 
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way classification of the analysis of variance table) it is possible to ensure that 
the v; and v; are all even, whence we may apply Theorem 4.3 and obtain 

r pil? 
(5.2) P= a 2 Ge: 

tml se] 

6. The accuracy of a simple approximation to the distribution of the ratio 
of independent nonnegative quadratic forms. Since approximation to the dis- 
tribution of a positive quadratic form Q by gx’(A) is fairly satisfactory, we may 
attempt to approximate the ratio of two independent quadratic forms Q’ and 
Q by fitting x’ distributions in both numerator and denominator, in the manner 
described. 

THEOREM 6.1. If Q’ is distributed approximately as g’x’ (h’) and Q as gx’(h), 
a quantity whose distribution approximates to that of the ratio Q’/Q is bF(h’, h), 
where b = (g’h’)/(gh) and the g’s and h’s are found from (3.1). In fact 


(6.1) b = Ki(Q’)/Ki(Q), —h’ = 2{Ki(Q’)}’/K2(Q’), hh = (2{Ki(Q)}*/K:(Q). 


In Table 2 are shown the exact probabilities (calculated from the finite series of 
Theorem 4.3) of exceeding the significance points obtained from the approxi- 
mation. The approximation is not of great accuracy, but may be usefully em- 
ployed to supplement the accurate (but less suggestive) exact methods. 


7. The one-way analysis of variance classification. Data, which it is desired 
to test for group to group homogeneity of mean value, often are obtained in cir- 
cumstances where group-to-group homogeneity of variance is not to be expected. 


To quote one of many examples; in biological work where each observation is 
the response observed with a particular animal and the subject of enquiry is the 
comparison of the effects of treatments applied to the animal, the application of 
the treatment itself would often be expected to cause extra variability, and the 
extent of this extra variability would vary with the type and manner of treatment 
applied. 

The problem of the effect of unequal group variances was considered in the 
case of the ¢ test by Welch [5]. He obtained approximate probabilities from which 
it appeared that the effect was small when the groups were of equal size, but 
larger when they were different in size. Later some exact probabilities for this 
case were found by Hsu [9] and another investigation by a different approximate 
method was made by Grunow [10]. Both these investigations confirmed Welch’s 
results. Quensel [11] considered the one-way analysis of variance classification 
more generally and obtained an approximate expression for the variance of the 
F criterion when the group variances differed. He concluded that the test would 
not be greatly affected if the group sizes were equal. 

David and Johnson [12], [13], [14] have discussed the general problem of the 
power function of analysis of variance criteria when the observations are dis- 
tributed independently but do not necessarily follow the normal distribution 
or have constant variance. As a special case they consider the one-way classifi- 
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TABLE 3 


Analysis of Variance. Group Variances Unequal 


Source Deg. fr Sum of squares 0 Expectation of Q Null distribution of Q 


Between = & « <1 | 
groups kK-1Qe= 2 ni(je. — 9.) 1— — : Y dA, x21) 


team taJ 


ne 


Within . 
N—kiQw= = ZF (yu — Ge.» > (ny, - 1) 0? y or x (nm, — 1) 
groups tn dew t=1 . tet : 


cation in which the observations are normally distributed but the variances 
differ from group to group. Their method is different from that given here and 
is an approximate one. At the time of writing they have published few numerical 
results and these [14] are confined to the case in which the sizes of the groups are 
all equal. Confirming the results of Quensel, only slight changes in probability, 
from those expected if the assumptions were true, have been found. 

Using the theorems on quadratic forms discussed above, the required prob- 
abilities may be found exactly, while the simple F approximation enables the 
nature of the effects found to be presented in a readily appreciated form. 

Suppose we have N = )-‘_, 7, observations classified into k groups. The 
ith observation in the ‘th group is y,; , the mean of the ‘th column #,. , and the 
grand mean g.., and there are n, observations in the ‘th group. Then in the 
usual analysis of variance, the quantities Q, and Qw shown in the third column 
of Table 3 are calculated and are associated with degrees of freedom shown in 
the second column of the table. 

It is usually assumed that 


(7.1) Ye = ne + 2 


where yn» = a + y, is the population mean for the ‘th group, 2 ont niy: = 0 
and the z,; are errors distributed normally and independently about zero with 
the same variance o. We retain all the assumptions except the last, and, in- 
stead of supposing the variance constant, we postulate that & (z:,;) = o; where 
o}, 02, °** , Or ** , o; are not necessarily all equal. Then 


k 
(7.2) Qe = Dnlyi + &. — &.)’, 


te=l 


k e 
7.3) &(Qz) Z. nye + & > m4 (Ze. . 


tl t==l 


We notice that when the null hypothesis is true, Q, is a quadratic form in the 
variables; Z, +--+ , 2.,°-: ,%,.The matrix M = {m,,} of this form is evidently 
N~'{dun.N — nm,}, where 6,, is the Kronecker delta. Also the variables follow 
the multi-normal distribution with diagonal variance-covariance matrix V whose 
ith element is o7/n,. It follows that the matrix U of Theorem 2.1 is 


(7.4) U = VM =N"{b,0i,N — oin,}. 
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Using (7.3) and Theorems 2.1 and 2.2, the expectations and null-distribution 
of Q, are those shown in the last two columns of Table 3, A, , +--+ , Ax. being 
the latent roots of the matrix U’. Again, from (7.1), 


ne k ne 


k 

(7.5) Qe = yo 2» Gi — H)* = x x (zi — %,.)*. 

Also, since > aus (z.; — 2,.)° is distributed independently of 2,, like oix'(m, — 1), 
it follows that Qw is distributed like > tat oix’ (nm, — 1) independently of Qz . 
We may now employ Theorem 4.1 to obtain the exact probability that the ratio 
of mean squares will exceed the significance points of the tabled F distribution, 
for any chosen set of group variances. In addition, an approximate value of this 
probability may be obtained using Theorem 6.1 with equations (2.5) and (2.6). 
We find that the ratio of mean squares is distributed approximately as bF(h’, h) 
where 


“is N=& ; ; 
(7.6) = Nik —1) X (N — mda! (ne - loi, 


(7.7) h’ ‘We noi */{ >. marl? + N >, (N — 2n)oi}, 
t t t 


(7.8) ‘(> (nye — Doi} > (n, — 1)o%}. 


A number of calculations of both exact and approximate probabilities are shown 
in Table 4. It is seen that in the cases studied the approximation, although not 


TABLE 4 


Probabilities of Exceeding 5% Point when Variances are Unequal in the One-Way Analysis of 
Variance Table 


Group variances Number of Observations Probability (¢ 
of exceeding 


5%, point 


| Values in approximating 
distribution, b F (h’, h), 
in Groups Total | 


V Exact Approx 
15 | 5.00 | 
15 5 

5 

9.2% 


NM tw Ww tb 
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of great accuracy, faithfully indicates the order and direction of the effects and 
enables a clear idea to be gained of the general effects to be expected. 


8. Equal groups. For equal groups the comparison of mean squares is unbiased 
in the sense that the expectations of the mean squares are equal when the null 
hypothesis is true. In fact 


&(Qzs)/(k — 1) = &(Qw)/(N —k) =e where & = (> o1)/k. 


The mean squares are distributed independently but their ratio does not follow 
the distribution ’(k — 1, N — k). Instead, applying the approximation, we find 
after a little reduction of (7.7) and (7.8) that the ratio of mean squares is dis- 
tributed approximately as F{(k — 1)e’, (N — k)e} where «’ and e, the factors 
by which the degrees of freedom are reduced, are given by 

\-1 


= oe 
— ¢ 
l 


(8.1) « = \ 1 +7 ir. e= (1+ c’)", 


v 


and c is the coefficient of variation of the variances. That is to say, c is the vari- 
. * ¢ ° 2 
ance of the variances divided by the square of the mean variance ¢ : 


(8.2) of aw ES (8 — 8/00)" 
tel 


Since, when the variances are unequal, ¢’ and « are less than unity, one would 
expect that the significance of effects would be somewhat overestimated. Com- 
parison in Table 4, of rows (2) and (6) with (1), and of row (11) with row (10), 
confirms this, and shows that for the differences in variance considered, only 
moderate discrepancies in probability occur. 

Now the o”’s are essentially positive and it is easily seen that 


(8.3) Ose sk-—-l, 


and if the variances range from a lower value o to an upper value ao, then the 

largest possible value for c is attained when k — 1 of the variances are equal to 
oie e 2 ° . 

o’ and the remaining one is equal to ao’, and that in this case 


(8.4) c = (k — 1) (a — 1)°/(a—1+4+ hk)’. 


TABLE 5 
Values of c?, «’ and « 


Largest Variance is a Times Larger than Each of the Remaining Variances 


3 groups 6 groups 10 groups 


3 6 3 6 


0.32 0.31 1.03 
0.86 7 0.80 | 0.55 
; 0.76 | 0.50 
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Values for c’ greater than one or at most two probably would be extremely 
rare in practice, although from the inequality (8.3) it is seen that values of c 
as great as k — 1, and hence values of ¢ as small as 1/k, could occur. Some idea 
of the discrepancy arising in a very unfavorable case is obtained by considering 
the example k = 7,m = nm, = +--+ = m = 3,a = 10; the exact probability of 
exceeding the assumed 5 per cent point is then 12.0 per cent. 


9. Unequal groups. It will be observed that the more serious discrepancies in 
Table 4 arise when the groups are unequal. This is because with unequal groups 
the comparison of mean squares is usually biased. If we write «’ for the weighted 
mean variance > vioi/ >, v,, and @ for the unweighted mean variance » a/k, 
where the weight », = n,— 1 is the number of degrees of freedom in the fth 
group then the expression (7.6) for the bias coefficient reduces to the form 
1-—1/Nfe’__\ 

1 — 1/k \e 
The bias is seen to depend upon the ratio of the unweighted and weighted means 
of the variances. 

In this connection it is of interest to consider the examples of rows (2), (3), 
(4) and (5) in Table 4. In each case the total number of observations is 15 and 
the unweighted mean variance is 2. In (2) the numbers in the groups are equal, 
the weighted and unweighted means agree, and there is no bias, the discrepancy 
in probability being small. In (3) the numbers are unequal but the distribution is 
symmetrical, the weighted and unweighted means again agree, and again the 
discrepancy is small and of the same order as that found before. In (4) the 
weighted mean variance of 1.67 is lower than the unweighted mean variance of 
2, causing an upward bias and a marked discrepancy in the direction of over- 
estimation of significance. In (5) on the other hand, the weighted mean of 2.33 
exceeds 2, causing a downward bias corresponding with a discrepancy in prob- 
ability resulting in underestimation of significance. 

We have seen that in the case of equal groups, the discrepancy, as measured by 
a reduction in the degrees of freedom of the approximating F distribution, is 
dependent on the spread of the distribution of variances as measured by the 
coefficient of variation. The feature of the distribution of variances which affects 
the bias, on the other hand, is related to the ‘“‘skewness”’ of that distribution 
as measured by the ratio of weighted and unweighted means. 

The factors which multiply the degrees of freedom in the approximation may 
be written in this case of unequal groups 


(9.2) d= + feay', «= +eF", 


where c(A) is the coefficient of variation of the \’s and é is the weighted coeffi- 
cient of variation of the variances. That is to say, @ is the weighted variance of 
the variances divided by the weighted mean variance a’: 


(9.1) b=1+ 


k 
(9.3) é = (N — kK) 2D loi — 6*)*/(6")’. 
tal 
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The variation among the d’s will be similar although somewhat less in extent 
than the variation among the o;’s so that, as before, the coefficients «’ and « 
will depend upon the spread of the o7’s. 

Study of Table 4, particularly rows (4), (7), (8), (9), (12), and (13), shows 
that quite large discrepancies can occur when the groups are unequal for even 
moderate variations of variance. Furthermore, it is clear that these discrepan- 
cies will persist in larger samples, for as the sample sizes are increased propor- 
tionately the bias coefficient will tend to the fixed limit 


(9.4) 1 + (k/k — 1) {e/e* — 1}. 
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assistance with the computations. 
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LEAST-SQUARES ESTIMATES USING ORDERED OBSERVATIONS 


By F. Downton 
University of Liverpool 


1. Introduction and summary. The purpose of this paper is to compare for 
various two-parameter distributions, of the form f{(« — u)/o}/o, the estimates 
of the parameters obtained by applying the method of least squares to the 
observations, after these have been arranged in order of magnitude. Estimates 
obtained by this process we shall call “ordered least-squares estimates.’’ Such 
estimates are unbiased and have minimal variance among all unbiased esti- 
mates which are linear in the ordered observations. 

This estimation process has been previously discussed by Godwin [1] and 
[2] and Lloyd [3]. In the present paper, ordered estimates are obtained explicitly 
for a class of two-parameter distributions having the above form. This class 
contains the rectangular and the right triangular distributions as special cases. 
It also reduces to the exponential distribution as a limiting case. Other special 
cases of this class of distributions have also been previously discussed by 
Craig [4]. 

Further, a general property of ordered least-squares estimates of the parameter 
d in distributions of the type f(a/A)/d is discussed. As a result it is shown that the 
ordered least-squares estimate of the scale parameter in the Pearson Type III 
distribution is identical with the maximum likelihood estimate. 


2. Notation and general theory. Let x; , x. , 23, --- 2, be a sample of n inde- 
pendent. observations on a continuous variate X whose distribution has the 
form f}(x — p)/a}/o. We may write 


~ 


La) < Xa) < Ma °° €, Tin) 


for the ordered observations. 

Let y- = (a — uw)/o and yy, = (tq — w)/o be the reduced observations, 
unordered and ordered, respectively. 

For r,s = 1, 2,3, --- n, let & (yq@)) = a, COV(Yur) , Yoo)) = Vee - 

Let a denote the (n X 1) vector of the a, ; v the symmetric, positive-definite 
(n X n) matrix of the v,, ; 1 an (n X 1) vector of 1’s; x the (n X 1) vector of the 
Zi) ; and y the (n X 1) vector of the y,). The inverse of v is v ' with ele- 
ments v;, . 

The ordered least-squares estimates of u and o are then 


(2.1) i = a’v ‘(al’ — 1a’)v 'x/A 


(2.2) é = 1’'v‘(1la’ — al’)v 'x/A 
where 
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A = (1’'v"'1)(a’v ‘a) — (1’v"'a)’ 


var (j) a’'v‘ao’/A 
var (¢) = 1’v"1o*/A 
cov (fi, ¢) = 1’v'ao’/A. 
These results were given by Lloyd [3]. 


3. Two-parameter distribution with explicit ordered least-squares solution. 
We introduce the generalized geometric variate X whose density function is 


ab” o 


f Pe p-t 
is a + a) ‘ u-a Sx<yu-— (a— do, 
(3.1) f(x) o 
(0, otherwise, 


where p 2 1, anda = V pp + 2) andb = V (p + 1)*/p. 
The expectation and variance of X are 


&(X) = uz, var (X) =o’. 


It will be shown that for all p 2 1 it is possible to find explicitly the expecta- 
tion vector and the variance matrix of the reduced ordered observations, and 
hence the ordered least-squares estimates of uv and c. 

The standardised form of the variate X is Y = (X — u)/e, for which 


&(Y) = 0, var (Y) = 1. 
The density function of Y will be 
pb? (y + a)” e —-asxsy<b-a 
(3.2) Sly) = ' 
\0, otherwise. 
In order to apply the results of Section 2 it is convenient to define the variate 


(3.3) T = (Y + a)/b = (X — w + ac)/b. 


Its density function is 


(pe? O<t<1 


fi 


\0, otherwise. 


Its distribution function is 


(3.4) 
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Its expectation and variance are, respectively, 
(3.5) &(T) = a/b = p/(p + 1) 
var (T) = 1/6 = p/(p + 2)(p + 1)’. 
For the vector T of ordered values of 7’, let 
&(T) = a, var (T) = w. 


To obtain the relationship between these quantities and the expectation and 
variance of the vector of reduced ordered observations y, we note that since 
(3.3) gives 7’ as a monotonic increasing function of Y, it follows that 


(3.6) y = bt —al 


where t is the vector of ordered observations on 7’. Taking expectations and 
variances, 


a = ba — al, 
In terms of p we then have 
(3.7) a = (p + 1) V(p+ 2)/p {a — pl/(p + 1)} 
(3.8) v' = pw /(p + 2)(p + 1)*. 


We now turn to the explicit calculation of these vectors and matrices. The 
rth element of a is 


1 
ay = {P'(n + 1)/T(r)T(n — r + 1)} | uF(O} {1 — FW)" f@ dt 
0 


where f(t) and F(t) are defined in (3.4). This reduces to 
(3.9) a, = ni") D8 /(np + 1)" 


where n“ = n(n — 1) --- (n — 8 + 1) and (np + 1) = (np + 1)({n — I]p + 1) 

-++ ([n — 8s + 1}p + 1). Similarly if w,, is the (r, s) element of w, 

(3.10) Wee = {nin (pn + 2)" )} — af 

and forr < 8 

(3.11) Wer = Wee = (nip '/ (pn + 2)" "(pls — 1) + 1)" } — arte. 
If w,¢ is the (r, s) element of a matrix w', where 

(3.12) wr = 0 when |r — s| > 1, 


(3.13) Wrent si Wrote - — (pr = 1) + 1)(pn + erst gether. 


(3.14) wre = {p'(2r° — 2r + 1) + 2p(2r — 1) + 1} (pn + 2)” 


ar +l (n—r+l) 
/p'"n 
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it may be shown by multiplication that ww ' = 1’, the unit matrix and hence 
l . . » 
wis the unique inverse of w. 


We now evaluate various quantities needed for computing the estimates; 
we have 
(3.15) a'w'a = pn (pn + 2) 
(3.16) a’w'l = 1'w'a = (pn + 1)(pn + 2) 
(3.17) I’wd = (p — 1)S + (pn + 1)(pn + 2) + 2(pn + 2) /nip" 
where 
(3.18) \(pn + 2)/(p — 2) — 2(pn + 2)” /ntp", for p #2 
(3.19) oo i(n + 1) 2 for p = 2. 

We now convert these expressions into the form in which they are used in 
Section 2, using equations (3.7) and (3.8). Then 

A = (1’v"'1)(a’v ‘a) — (1'v‘a)’ 


P {(1’'w '1)(a’w a) — (1’w ‘a)’} 
(p + 1)*(p + 2) 
(pn + 2) 


(3.20 
(p + 1)*(p + 2)(p —: 


(pn + 2)(pin — 1] + 2) 
9 ( 9 (n) 
_ 2pn(pn + 2) | 
nip" 


a’v ‘(al’ — 1a’)v ' 


ly ' 
/ 


- pla’ — pl’'/(p + 1)\[w ‘(a’l’ — 1a’)w'/(p + 1)*(p- 


and 
(3.23) 1’'v ‘(1a’ — al’)v | = p 1'w ‘(la’ — al’) (p + 1)*(p 4. 9)". 


If, in the (1 & n) vector a’v ‘(al’ — 1a’)v"' = 8, the rth term is, say, 8, , then 


hl p(p — 1)(n — 1)(pn + 2)(pn + 2)” 
(p + 1)*%(p + 2)nier? pr 


(3.24) B, + ”,. 


where, with S as defined in (3.18) and (3.19), 


(n—r+1) 


2p*(n — 1)(pn + 2) 


(3.25a) = 
3.208) " (p + 1)3(p + 2)n'"-"t” p” 
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(3.25b) Nr for2srain-—l, 


Pp 


7 bs + 2)(pS — 2) 
Pie +e 7 Oe t ee FOS - me 


in = 


4 2p(pn + (pn + 2)” 
n!p" 
In the (1 X n) vector I’v"‘(la’ — al’)v' = y, say, if the rth term is y, , then 
—p'*(p — 1)(pn + 1)(pn + 2) 
(p oe 1)*(p 4 2)3/2 nm wr») of r+i 


(n—r+1) 


(3.26) _r = dew Vy 


where 


(n 


2p° "(pn + 1)(pn + 2)(pn + 2) 
(p + 1)*(p + 2)8? n!p" 
(3.27b) Vy 0 for2sran 
2 2) ) 
1. — *.. ; 
n!p” 


(3.27a) y= 


3 


p (pn + 1)(pn + 2) | 
(p + 1)*%(p + 2)8? | 
The only other quantities necessary for substitution in the expressions of 

Section 2 are 1’v'1, a’v‘a, and 1’v ‘a. These quantities may be shown to be 

given by 


(p — 1)S 


Vv, = 


e 


v'1i= P <(p — 1)S + ( 1)(pn + 2) 
(p + 1)*(p + 2) | P ¥ en + te + 
2( _ 
1 pn . 
n'p" 
P 
p+ 


‘a = (2pn + 1)(pn + 2) + 


\ 2( . 9)" 
-|p(p — 1)S — 2(pn + 1)(pn + 2)} + pn . ) 
n!p" 
a p' “(pn 7. 1)(pn + 2) “a p 2 
(p + 1)(p + 2)!” (p + 1)%p + Di 


(3.30) 


2(pn + 2) 
; nip" 

As the expressions obtained in this section tend to be rather long, we do not 
write out the explicit formulae for the estimates. In any special case we merely 
have to substitute the values of (3.20) et seq. in the general formulae (2.1) to 
(2.6). In the following section we proceed to consider some special cases. 


-<(p — 1)S + (pn + 1)(pn + 2) + 


4. Special cases. 

a) Rectangular distribution, p = 1. Although the solution of the rectangular 
distribution is well-known it is quoted here since it is a special case of the system 
solved in Section 3. 
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For the rectangular distribution, centered at u with variance o, the density 
function is 


hV/30, w-V30S2<n4+vV3o 


" otherwise. 


f(z) = 


Using p = 1 in the expressions already derived yields the values for the ex- 
pectation vector, variance matrix and inverse of the variance matrix as given by 
Lloyd (1952). The resulting estimates of » and o are: 


i= 4 (2a) + Zn)), ¢ = (n+ 1)(Zx) — tay)/2V3(n — 1) 
var (4) = 60°/(n + 1)(n + 2) var (¢) = 20°/(n — 1)(n + 2) 


cov (4, ¢) = 0. 


b) Right triangular distribution, p = 2. A less familiar example is the right 
triangular distribution, which is unsymmetrical. A convenient form for the 
density function is 


fia) = {1@ — w/e + 3V3}/R6, up—2 90 Sx<unt V2 


otherwise. 


\ 


By substituting p = 2 in the general expressions of Section 3, the elements of 
the expectation vector a and the variance matrix v are found to be 


a, = {6n°*"t%2""*"/(2n + 1)” — 4}/1/2, 
Ver = 18{r/(n + 1) — [n° 2°77 /(2n + 1)" PF}, 
Ure = (8 — 1)°2"p,,/(28 — 1)”, 
Also, for v™, the inverse of v, we have 
—6 0 
33/2 -10 0 
~10 73/2 -—4 
0 
—2(2n — 1) 
0 —2(2n — 1) (8n* + 1)/n 
Thus the estimates of » and o, and their variances, are 


— | (r 2 1 = | ] 
pa MSAD + try + MAH ew Bt|/[n Et -1] 


| ie | rem2 Tf 


1st [(E}+ 3) an —ma -ERY EL 





LEAST-SQUARES ESTIMATES 309 


) 


seilE etn 2\/[nz- 


9 rc 


ied te o l l 
var (¢) = op | 2 - + 4(n + »|/|n Y= - | 
a is ¥! yr! 
COV (a, 7 V2 (n + 1) | ao? t|/ | r r | 


where all summations are for r from 1 to n unless otherwise indicated. 

To facilitate estimation of the parameters of a distribution of this type, the 
elements of the expectation vector a, variance matrix v, and the coefficients of 
the ordered observations in the estimates 4 and ¢ have been computed for 
samples of size n < 10. 

Tables I and II give the values of a, and v,, (for r S 8s). Values of v,,(r < 8) 
may be determined by considerations of symmetry. Tables III and IV give the 
values of b, and c,, the elements of b and c, respectively, where 4 = b’x and 
&6 = c’x. Table V gives the values of var (f)/o’, var (¢)/o° and cov (jf, &)/o’. 

Perhaps in some cases it would be more natural to estimate the extremities of 
the distribution. Since these extremities are linear functions of the parameters 
u and a, their least square estimates will be the same linear functions of 4 and ¢. 


var (4),= 


5. Single parameter system. When a distribution is defined by a single pa- 
rameter, \, say, which is a measure of dispersion, then the density function will 
be of the form f(z/A)/A. If we take Y = X/), then Y will have a nonparametric 
distribution, and we may assume that the expectation vector, &(Y) = a, and the 


variance matrix, var (Y) = v, are known. Then the ordered least-squares esti- 
mate, A, of A is 


(5.1) \ = av 'x/a’'v ‘a 
and 
(5.2) var (A) = \*/a’v ‘a. 

The distributions of Section 3 may clearly be reduced to a system of single 
parameter distributions to give density functions of the type (for p 2 1): 
(5.3) f(x) = p(b — 2/r)”"*/nd’; O0Osf2<ba 
where b = (p + 1) Vip + 2)/p. 

TABLE I 


Expectation vector, a 


+0. 56569 
+0.08081 +0.80812 
—0.24244 +0.40406 +0.94281 
—0.47753 +0.11020 +0.59997 +1.02852 
—0.65837 —0.11585 +0.33625 +0.73183 +1.08786 
—0. 80304 —0.29669 +0.12527 +0.49448 +0.82677 +-1.13137 
—0.92218 —0.44561 —0.04848 +0.20002 +0.61176 +0.80844 +1. 16465 
—1.02251 —0.57103 —0.19479 +0.13441 +0.43070 +0.70229 +0.95448 +1.19002 
—0. 32020 0.00667 +0.27550 +0.53416 +40.77435 +0.99052 +1.21218 





0. 38498 


=9 0.35051 


n= 10 0.32163 


0.32000 
0. 48000 
0.35755 
0. 3633 


0.34177 
0.51265 


0.31551 
0. 47327 


0.28913 
0. 43369 


0.26519 
0.3977 


- 14694 
22041 
. 27551 
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TABLE II 


Variance matriz, ¥ 





. 18721 
- 28082 
35102 


19439 
29158 
36448 


- 18998 
. 28497 
. 35621 


. 18149 
- 27224 
. 34030 


0.24414 
0. 36621 


0. 22576 
0.33865 


0. 20072 
0.31458 


0.17184 
0.2577 
0.32220 


0.16221 
0.24331 
0.30414 


0. 15306 
0.22959 
0. 28699 


0.08127 
0.12190 
0. 15238 
0.17778 


0.11250 
0. 16875 
0.21004 
0.24610 


0. 12368 
0. 18552 
0.23190 
0. 27055 


0. 12603 


0, 18904 
0. 23630 
0. 27568 


0.12429 
0. 18644 
0. 23304 
0.27189 


0. 12067 
0.18100 
0.22625 
0. 26396 


0.11624 
0.17435 
0.21704 
0.25427 


0, 05037 
0.07556 
0.09445 
0.11019 
0. 12397 
0.07384 
0.11075 
0. 13844 
0. 16152 
0.18171 


0.08464 
0, 12696 
0. 15869 
0. 18514 
0. 20829 


0.08903 
0.13355 
0. 16604 
0.19476 
0.21910 


0.09004 
0. 13506 
0. 16882 
0. 19696 
0.22158 


0.08921 
0.13381 
0. 16726 
0.19514 
0.21953 


0.03372 
0.05059 
0.06323 
0.07377 
0. 08299 
0.09129 


0.05154 
0.07731 
0.09664 
0.11275 
0. 12684 
0. 13953 


0.06100 
0.09149 
0. 11437 
0. 13343 
0.15011 
0. 16512 


0. 12337 
0. 14393 
0. 16192 
0.17811 


0.06790 
0.10185 
0.12732 
0.14854 
0.16710 
0. 18381 


0.02387 
0.03580 
-04476 
-05221 

- 05874 

. 06462 

. 07000 

- 03766 
05649 
07062 

. 08239 
09269 

- 10195 

- 11045 


- 04570 
06856 
. 08570 
09998 
11247 
. 12372 
13403 


. 05031 
- 07547 
09433 
11006 
- 12381 
13620 
14755 


0.01762 
0.02643 
0.03304 
0.03854 
0.04336 
0.04770 
0.05167 
0.05536 


0.02851 
0.04276 
05345 
06236 
-07016 
07718 
08361 
08958 


0.01344 
0.02016 
0.02520 
0.02940 
0.03308 
0.03639 
0.03942 
0.04224 
0.04488 
0.02220 
0.03729 
0.04162 
0.04855 
0.05462 
0. 06008 
0.06509 
0.06974 
0.07410 


0.01053 
0.01579 
0.01974 
0.02303 
0.02591 
0.02850 
0. 03088 
0.03309 
0.03515 
0.03711 
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TABLE III 
Coefficients for estimating ~. (Vector b) 


0.5000 0.5000 

0.4444 0.0741 4815 

0.4091 0.0682 0.0455 0.4773 

0.3840 0.0640 0.0427 0.0320 0.477: 

0.3650 0.0608 0.0406 0.0304 0.0243 0.4789 

0.3499 0.0583 0.0389 0.0292 0.0233 0.0194 0.4810 

0.3375 0.0562 0.0375 0.0281 0.0225 0.0187 0.0161 0.4834 

0.3271 0.0545 0.0363 0.0273 0.0218 0.0182 0.0156 0.0136 0.4857 

0.3181 0.0530 0.0353 0.0265 0.0212 0.0177 0.0151 0.0133 0.0118 0.4879 


TABLE IV 


Coefficients for estimating &. (Vector c) 


—0.8839 +0.8839 
—0.5500 —0.0017 +0.6416 
—0.4339 —0.0723 —0.0482 +0.5544 
| —0.3734 —0.0622 —0.0415 —0.0311 +0.5082 
—0.3355 —0.0559 —0.0373 —0.0280 —0.0224 +-0.4790 
—0.3030 —0.0505 —0.0337 —0.0253 —0.0202 —0.0168 +0.4495 
—0.2898 —0.0483 —0.0322 —0.0241 —0.0193 —0.0161 —0.0138 +0.4436 
—0.2746 —0.0458 —0.0305 -—0.0229 —0.0183 —0.0153 —0.0131 —0.0114 +40. 4319 
—0.2624 —0.0437 —0.0292 —0.0219 —0.0175 —0.0146 —0.0125 —0.0109 —0,0097 +0.4225 


TABLE V 


Variance and covariance of estimates 


var (4)/e? var (¢)/e? cov (u, 0)/e? 


0.5000 0.5625 0.1768 
0.3148 0.2477 0.1244 
0.2227 0.1506 0.09482 
0.1691 0.1051 0.07599 
0.1345 0.07938 0.06304 
0.1107 0.06303 0.05364 
0.09340 0.04946 0.04652 
9 0.08037 0.04205 0.04097 
10 0.07024 0.03641 0.03652 


It is of interest to study this single parameter system, chiefly because the 
limiting case, as p — ©, is the exponential distribution f(z) = e*'’/d. Before 
considering this system, however, it is useful to consider one general property 
of estimates of this type. Lloyd derived necessary and sufficient conditions for 
the variance of the ordered least-squares estimate of the mean to attain its upper 
bound o*/n for symmetric distributions. The author of the present paper has 
since extended these conditions to include the unsymmetric case [5]. This would 
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suggest that in the single parameter case it may be possible to derive an upper 
bound for var (\), using a similar approach. 

We first note that the reduced ordered observations y;,)(= 2;)/A, here) are 
a permutation of the unordered observations y;,) , and hence 


(5.4) Va = Deyn) = &(QXym) = L(y) = n8(Y), 


where &(Y) is the mean of the nonparametric parent population. A similar argu- 
ment shows that 


(5.5) 1’vl = n var (Y). 


We also note that both v and v' are symmetric and positive definite and 


hence may be expressed as v = tt’ and v = (t™')’t™', where t is a lower tri- 


angular matrix. It then follows that 
a'v'a = a'(t")'t"a = hh = Shi, 
say, where h ta. Similarly we find that 
I'v = 1'tt’l = k’k = Doki, 
say, where k = t’1. 
Now, the Cauchy-Schwarz inequality (Soh) (kD = (Sohik,)? in matrix 


form becomes (a’v‘a)(1’v1) = (1’a)’, or 
(5.6) a’v'a = nfé(Y)]’/var (Y). 
The necessary and sufficient condition that the equality shall hold is that k; = gh; 
for some constant qg and for all 7. In matrix form this becomes 
(5.7) vl = ga. 
Premultiplying by 1’, it follows that, necessarily 


(5.8) q = var (Y)/8&(Y). 


Since var (\) = \’/a’v"'a, the variance of the ordered least-squares estimate 
of \ has an upper bound such that 


(5.9) var (A) S Xd var (Y)/n[e(Y)). 


This upper bound is attained if, and only if, 


(5.10) &(Y)-v1 = var (Y)-a. 


It may also be shown by substituting (5.10) in (5.1) that if this upper bound 
is attained the estimate of \ is, necessarily, given by 


(5.11) h = 1’x/n&(Y), 


which is proportional to the arithmetic mean. 

We now proceed to examine the ordered least-squares estimates of \ for dis- 
tributions of the type given by equation (5.3). In the notation of Section 3, we 
consider a variate T' = Y/b = X/\b, for which &(T) = a and var (T) = w. 
Then for the distribution (5.3), 
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(5.12) a = (p+ 1)Vp + 201 — Ja)/Vp 
(5.18) v = (p + 1)'(p + 2)JwJ/p 
(5.14) v= pJw'J/(p + 1)°(p + 2) 


where J is the permutation matrix 


Then the rth element of a is given by 
5.15) a, = (p+ 1)VAp +2 (1 — np'/(pn + 1))/Vp. 
Also, for r < 8s, when a, satisfies (3.9), 
(5.16) ver = tre = (p + 1)°(p + 2){np'/(pn + 2) (pln — 8] + 1)" — 
in—r+iin—o+i} /P 

(5.17) = (p+ 1)(p + 2){np'/(pn + 2) — areas} /p. 
Also 
(5.18) v. = 0, lr—s|22 
(5.19) trie = Urea. = —p(p[n — r] + (pn + 2)°*/p'n (p + 1)°(p + 2), 

p (Pain — 7 + 1}? — An ~ ¢ + 1) +1) 


\ (pn + 2) 
(5.20) vy = _ 


pn” (p + 1)*(p + 2) 
Also, with S satisfying (3.18) and (3.19), 


= (p — 1)S + 2(pn + 2) /p"n! — (pn + 2) 
(pn + 2)/(p — 2) — 2(pn + 2)"/p"nl(p — 2), p # 2. 
The rth element of the vector a’v™ is given by 
(5.23) fa’v'}, = (p— 1)(pn + 2)°V p/(p + 1)np’ Vp+2+ 
where n, = Oforl S r S n — 1, and 
(5.24) nn = 2(pn + 2)” V p/(p + 1)n!p" V p+2. 
From these expressions the ordered least-squares estimate of 4 and its variance 


may be calculated. 


6. Exponential distribution. Taking the limit of (5.3) as p — ©, we find 
f(z) = &*"*/d. The expressions given for a and v in Section 5 now become in- 
determinate. However, taking limits in (5.19) and (5.20), we obtain 





F. DOWNTON 


(Qn — 1)? +1 —2(n — 1)’ (0 
—2n—1)* (Qn—3)?’+1 —2(n—2)? 0 
0 —2(n — 2)> (2n — 5)? +1 


0 


0) 0 


Also, a’v'a = n, from (5.21), and a’v' = 1’, from (5.23). Thus the ordered 
least-squares estimate of the parameter \ is the sample mean, which is also the 
maximum likelihood estimate. 

To determine the variance matrix v in any particular case, the author has 
found it simpler to invert the matrix v' by triangular resolution [6] than to 
evaluate the integrals which give the individual elements of v. It is also possible 
to use this matrix v, obtained by inversion of v', to compute the elements of the 
expectation matrix a. 

We note that for the reduced exponential distribution 


fly) =e”, &(Y) = 1, and var (Y) = 1, 


and therefore the upper bound for var (i) is \”/n. But var (\) = \°/n, and hence 
condition (5.10) operates. We have therefore that vl = a, or that the elements 
of a are the row sums of v. 

In this case an alternative method may be used for deriving the variance 
matrix. It may be noted that since increasing the sample size by unity involves 
adding a row and a column to the leading edges of the matrix v', such an in- 
crease in sample size will affect v in a similar way. Thus, calculating the variance 
matrix for any specific sample size n enables us to determine the variance matrix 
for a sample size n + 1, and successively for all sample sizes greater than or, 
conversely, smaller than n. 

Suppose v, and v,,, are the variance matrices, with inverses v,' and va. , 
of samples of n and n + 1 ordered observations, respectively, and suppose v, 
is known (by inverting v,,', or otherwise). It may be easily shown that v,,; may 
be represented in partitioned form by 
A. -4 —h’v, 


Van+1 = da 


—y, hi v./a + v,hh’v. 


where a = {4[(2n + 1)° + 1] — h’v,h}™' and h’ = (—n’, 0, 0, --- , 0). This 
follows directly from the fact that in partitioned form 


$[(2n + 1)° + 11h’ 
h vw 


1 
Vin+l 





LEAST-SQUARES ESTIMATES 315 


From the explicit expression for the matrix of v™', above, it may be shown that 
I'v, = (n’, 0, 0, --- , 0) = —h’. Post-multiplying by v, gives — h’v, = I’ and 
hence h’v,h = n’, giving a = (n + 1)”. Thus v,,, and v, are connected by the 
difference equation 


1 l yy 11’ 0 0’ 


Vay = - ~ 


-_—— ee | ee +] -—! 
(n+1)?} 1 \(n + 1)’v, + 12’ (n + 1)? 0\v, 


A similar relation connecting a, and a,4; , the respective expectation vectors, is 
obtained from the fact that l’v = a’. Thus a,4; may be partitioned, such that 


| 
angi = {li (n + la, + 1'3/(n +1) = 1'/(n + 1) + Ola,), 
| 


which is the necessary difference equation. In the foregoing, the vectors 1 and 0 
have the number of elements suitable to their context. 


7. Pearson Type III distribution. Consider a variate X , whose density function 
is given by 


1) f(x) = 2” "6" /T(p)r” 0OS2< o. 


If we assume p is known, this is the Pearson Type III distribution depending 
on @ single dispersion parameter \. Since it has the functional form discussed 
in Section 5, the relationships (5.1), (5.2), and (5.4) to (5.11) hold. If Y = X/A 
we have for distribution (7.1) &(Y) = p and var (Y) = pso that from (5.9) 


(7.2) var (A) S \*/np. 


It is known from general theory that for an unbiased estimate \* of A, the 
variance of \* has a lower bound given by 


—1 
var (A*) = E / (8 In f/ad)*f as | . 
Thus for any unbiased estimate \A* of A, 
(7.3) var (A*) 2 d°/np. 


Since \ is an unbiased estimate of A, both (7.2) and (7.3) hold. This can be so 
only if 
(7.4) var (A) = A*/np. 

This means that the variance of the ordered least-squares estimate attains 
its upper bound, so that } is, necessarily, given by (5.11), that is, by } = 1’x/np, 
which is also the maximum likelihood estimate. This proves also, for the Pearson 
Type III distribution, that vl = a, since (7.4) can be true only under this con- 
dition. 


I am indebted to Dr. E. H. Lloyd for suggesting many improvements to the 
original draft of this paper. 
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ESTIMATION OF THE MEAN AND STANDARD DEVIATION BY 
ORDER STATISTICS' 


By A. E. SARHAN 


Egyptian Medical Research Laboratories 


1. Summary. The aim of this paper is: (i) to find the best linear estimates of 
the means and standard deviations of the rectangular, triangular, exponential 
and double exponential populations; (ii) to compare the efficiencies of these 
estimates with some other estimates for small samples; (iii) to discuss the varia- 
tion of coefficients in the best linear estimates as the population varies. 


2. Introduction. In recent literature, linear combinations of the sample ordered 
values are used to provide estimates from random samples drawn from popula- 
tions with specified forms. Such statistics are termed systematic by Mosteller 
[8]. They are now in common use, because they provide very simple solutions of 
many important parametric problems of statistical estimation. Sometimes they 
are inefficient in the sense that they do not use all the information contained in 
the sample as it would be used by the best possible methods, which are however 
computationally more complicated. In this work that estimate is obtained (called 
for short “best linear’) which is the best linear combination of the ranked ob- 
servations. 


3. Rectangular population. The frequency distribution of a rectangular popu- 
lation is 


fly) = 1/6, A —-he Sy S A + ihr 


where 6; is the mean and @, is the range. Let y: , y2, --* , yn be a sample of size 
n and let the observations be ordered to give ya), Ya), °*** , Yo), forn 2 2. 
Now consider the linear estimates 


* * 
6; = > any i) 62 = > a2: Yea) 
i=l i=l 


The method of least squares will provide the best linear estimates of 6, and 6 . 
The estimates are (Lloyd [7]) 


(3.1) a = Mya + ym) 
(3.2) 62 = (yin) — Ya) (n + 1)/(n — 1) 


and the variances 


Received 5/25/53, revised 9/30/55 

' This is a summary of thesis submitted to Liverpool University on July 1, 1952. The 
estimation of the parameters of the rectangular population was included which was obtained 
independently of Lloyd’s work [7]. The details of this case are not given here. 
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(3.3) Vier) = 6;/2(n + 1)(n + 2), 
(3.4) V(03) = 263/(n + 2)(n — 1). 


The standard deviations (¢) can be estimated by 62/2+/3. For the special 
case f(y) = 1/@,0 S y S 6, the estimates are (Craig [2]): 


(3.5) 0 = (n + 1)ym/2n, 

(3.6) 6s = (n+ 1) Ycn)/n 

(3.7) V(6t) = 6:/4n(n + 2), 

(3.8) V(02) = &/n(n + 2). 

The maximum likelihood estimates are in agreement with the best linear estimate 


in both the general (Fisher [5]) and the special cases. 


4. Triangular population. The frequency distribution of the triangular popu- 
lation is 


fly) = (4/62)(40 — |y — |), ly — |S 4s 
where 6, is the mean and @ is the range. Standardizing the variable (0, = } 4. = 1) 


to get 


{ 4a, 
J(z) = \4(1 — 2). 


we have (Wilks [11]) 
j 1 
(4.1) E(z,) = [ arfi(x,) dx, + / Xr fo(x,) dz, 
“0 ; 


where 
fila,) = K(2 a7)" 
flzr) = K(2Q1 — 2,)"|""[1 — 201 — 2,)*)"401 — 2,) 


with K = n!/(r — 1)!(n — r)! and 


0 


4 1 
(4.2) E(x’) = | xe fila, dx + a; fo(x,) dx 
“4 


with the same notation for f;(z,) and fo(x,). Also, when x, < 2, , 


aZs 


E(z,2.) = | 2, Xf (x,, x.) dx, dx, 


0 -0 


“0 ~0 


oh pt, 1 ph 
= | | Ir Xefi(x-, X.) dx, dx, + | | Br Befaee , Z.) Ax, da, 
j 4 
1 


1 
+ | / XX, f3(x, , x.) dx, dx 
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(1,1) 


where 
Vf 2yr—-l se 2 r 2\e—r 
Sila, , %) = C(2x,)" (22, — 22;) 


Salar, Ze) = C(2z7)” [1 — 222 —: Pl 2(1 — 2z,)"]""*(4)’2,(1 — 2,) 


- 


f(a, , x.) = Cll — 2(1 — 2,)*) [201 — 2,)? — 201 — 2)" "(201 — 2,)*}"" 


. :, 
(4)° (1 — a,) 1 — a,) 

C = ni/(r — 1)'(s — r — 1)l(n — 8)!. 
The expected values, variances and covariances of the order statistic (x,) are 
shown in Table I up to sample size 5. For the best linear estimate of the mean, 


n 


(4.4) | = D> anya 


cant 
the coefficients a,; are given in Table ITI. Since 
(4.5) Viy) = 65/24 
we can estimate the standard deviation (¢) by (6 4/24), and the coefficients 


may be adjusted to give the best linear estimate of the standard deviation 
(o*). These adjusted coefficients for which 


(4.6) 


are also shown in Table III. 
Cramér [4] found for large samples that the ordinary sample mean is a better 


estimate of the mean of the distribution that is the midrange; Table IV shows this 
to be true also for small samples. 
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TABLE I 
Exact expected values, variances, and covariances of the order statistic (x,) in samples of 
size n from standardized triangular populations. Each variance and covariance value must 
be divided by the avpropriate divisor given in the last column. 


" r 


E(X+) 


23/60 


cov (X,,X:) cov (Xr, X2) cov (Xr,Xs) | cov (Xr,X4) cov (Xr, Xs) Divisor 


101 491 3,600 


13/40 726 


400 


33 ,600 


20/40 100 690 


1451/5040 2,399 261 ,651 105 , 529 


25,401,600 


2199/5040 651 418,509 275,409 170,781 


527 
794, 
1008 / 


2016 
2016 
2016 


665 | 2,034,710 
710 3,146,180 
952 | 2,143,008 


1,379,952 983 , 242 
2,143,008 1,530,940 
2,978,640 | 2,143,008 


630 , 871 
983,242 223,534,080 
1,379 , 952 


TABLE II 


Exact expected values, variances, and covariances of the order statistic (x,) in samples of 
size n from standardized double exponential populations. Each variance and covariance value 
must be divided by the appropriate divisor given in the last column. 


n | a(X cov(Xr, Xi) 


cov (Xr, Xe cov (Xr, Xs cov (Xr, Xs) | cov (Xr, Xs Divisor 


23 16 


— 133/96 


11/32 

— 305/192 
| —55/96 
| 0 
| 


1,354,983 
144,738 
243, 328 


4,315 
4,799 
444 ,738 


463 ,068 
258 , 208 


2,945 


243 ,328 
258 , 208 
323 ,648 


183 , 838 
197 ,668 
258 , 208 


170 , 233 
183 ,838 
243 ,328 


| 921,600 


5. Exponential population. The frequency distribution of an exponential 


population is 


Let x = (y - 


Ey) 


6.1) | 


E(w) 


fy) =e 
u)/o to get f(x) = e *forO Sx 


=u + ck (x,), 


K | 


z(1l—e 


(yw) /o / 


g/o, 


penres _—e 


Ss w. The 


dx, = 


i=1 


on 


> I 


(n —21+ 1) 
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TABLE III 


Coefficients in the best linear estimates, based on the order statistic y«) in different popula- 
tions of size n, for the mean, Lf au yu), where ary = ayin—i4x and the standard deviation 
s 
Lin1 an Yu, where ay = —ar(n-i+1) 





Sam le Size, n, end 


a ou 





2 ra Rectangular | .5000000 | — .8660253 
Triangular . 5000000 .8748178 | 

| Normal | .§000000 . 8862269 

| Dbl. expon. | .5000000 | — .9428070 


3 Rectangular | .5000000 .5773503 | 0. 
Triangular .3945578 . 5832118 . 2108844 
Normal | .3333333 .5908179 | 3333333 

| Dbl. expon. | .1481481 | .6222161 .7037638 





| Rectangular | .5000000 | — .4811250 | 0. | 0. 

| Triangular | .3378906 | —.4722486 | .1621004 | — .0541433 | 

| Normal | .2500000 .4539404 | .2500000 | — .1101807 | 

| Dbl. expon. | .0472971 | —.4307352 | .4527030 | — .3003697 
Restungnies 5000000 .4330128 | 0. 0. , 
Triangular | .3060758 -3994195 | .1188518 | — .0637213 | . 1501447 | 

| Normal . 2000000 .3723816 | .2000000 | — 1352139 | .2000000 

| Dbl. expon. | .0166355 | — .3263380 —-.2213003 | — 3169696 | 5241287 





TABLE IV 


Percentage efficiencies of various estimators of the mean in different populations. Sample 
mean = 9, midrange = w, median = 9 


Sample size 2 3 4 $ 


Estimator g w y v w y y w ¥ u w y 
Rectangular!100.00/100.00/100.00 90.00)100.00 50.00! 80.00/100.0062.50 71.45)100.00/33 .33 
Triangular. .|100.00}100.00'100.00| 98.82, 96.58'66.83) 97.72) 92.7174.53| 96.70! 89.25'60.47 
Normal 100 .00|100 .00/100 00,100.00) 91.99/73 .55)100 00, 83.99 '83.89 100.00) 76.70,69.69 
Dbl. expon .|100.00/100.00)100.00| 88.43) 67.90/92.27) 82.80 49.65/98.90| 79.21) 38.2990. 23 


where K = n!/(r — 1)!(n — r)!. Also 


, 2 —rr\rT—1 zr, (n—r+1) 
E(x K | «(1 — &*")""e ieee F 


“0 


<1 (n — 14 + 1)*’ 


( 

ey 

P r 
sF E(2) — [zs - & —— 
] 


* 


_ | t,z,(1 — e**)" (e*" — eo)" 
0 


(677) 6-*" dx, da, 
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where C = n!/(r — 1)!'(s — r — 1)'(n — 8)!. Finally 


r 


7 l 
(5.3) ‘ov (2,2,) = wiihelinaaglang leis 4 
es TTD 


Therefore, 
V' = 
n* + (n — 1)’ —(n — 1)° 0 
(n — 1)* + (n — 2) —(n — 2)’ 0 


(n — 2)? + (n — 3)? —(n — 3)’ --- 


| — | . | 
in ~ ¢ +1) iT Ti a? ee 


it follows that 


(A'V*A)™ 


— n(n — 1) n(n — 1) 
Therefore, 
(5.5) = [nya) — gl/(n — 1), 


(5.6) o* = [9 — ya|n/(n — 1). 


These estimates are in agreement with the maximum likelihood estimates. From 
(5.4) we have, 


(5.7) V (u*) = 


(5.8) V(o*) = 1/(n — 1). 
Since the mean is equal to » + o, the estimate of the mean is 


nya) —9 
(5.9) [1 1] 


pitino 
(n— I) Lng —nya 

‘Therefore, the best linear estimate of the mean of the exponential population is 

the sample mean. 
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6. Double exponential population. The frequency distribution of the double 
exponential population with mean yu and variance 20° is 


Sty) = er *""/2¢. 


Let x = (y — w)/o giving f(z) = je" for —-~ Sas w,and E(ym) =u t+ 
cK (x). Then, 


(6.1) E(zw@) = | XrSi(a,) dx + | Xr Sola) dx 


where 


Si(zr) = K(he*")' [1 — 4e*"]" 


z 


jo(a,) = K(L — 4e7) "(de 


with K = n!/(r — 1)!(n — r)!. Also 
a 


(6.2) E(x) = a; fila,) dx + I a, fola,) dx 
~ 0 


J 


with the same notations for fi(z,) and f2(x,). Finally, 


E(2(r) 2) = | / (2X5) f(r, Xe) AxF Le 


0 Zs 
= [ [ JA te » Lele Xe Aly Whe 
x — @ 


« 0 
+ | / ee Lefo(zr, X,) da, dx, 
0 h— 3 


"OD py 


+ | | vy refs(2r, x,) dx, dx, 
° 


“0 
where 
I(Xe , Be) C(4)*e"*" (e** — ¢ 


fola,, te.) = C(he*’) "(1 


Sa(ar , Xs) - C(4)" tity aa 


with C = n!/(r — 1)'"(s — r — 1)!(n — 8)!. 

The exact expected values, variances, and covariences of the order statistic 
(x,)) are given in Table II for n = 2, 3, 4, 5. The missing entries in the table 
may be obtained by E(2y)) = — E(2n_,+1)) for expected values and cov [2()x;.)] 
= COV [X(n-r4Tin—24y], for covariances. 

The coefficients (a;;) in the best linear estimate up* = Doe iyo of the 
mean are given in Table ITI. 

Since «> = 20°, then the coefficients in the best linear estimate of o can be ad- 
justed to give the coefficients in the best linear estimate of the standard devia- 
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GRAPH (1) 
PERCENTAGE EFFICIENCES OF THE SAMPLE MEAN, MIDRANGE, AND MEDIAN IN DIFFERENT POPULATIONS 
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TRIANGULAR ———--——- DOUBLE EXPONENTIAL 


SAMPLE MEAN MIORANGE MEDIAN 


PERCENTAGE EFFICIENCY 


3 4 3 4 3 4 
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tion (op). The adjusted coefficients (a2;) for which o> = > ent O2:Y (x) are given 
in Table IIT. 

Comparison of efficiencies of different estimates of the mean shows that the 
median is more efficient than either the sample mean or the midrange and less 
efficient than the best linear estimate. The maximum likelihood estimates of the 
mean and the standard deviation of this population are the median and the mean 
deviation about the median (Fisher [6]), respectively. Neither is efficient for 
small samples. 


7. Comparison of different estimates. Table IV is constructed to give the 
percentage efficiencies of midrange, median and sample mean as estimates of 
the population mean (relative to the best linear estimate of the mean). The 
comparison of efficiencies of the estimates in the different populations may easily 
be seen in Graph 1. 

The sample mean is the best linear estimate of the mean of a normal popula- 
tion, so in general we expect that its efficiency decreases in both platykurtic 
and leptokurtic populations. The efficiency of the midrange decreases in normal 
populations and again in leptokurtic populations. The median behaves in a 
reverse way: its efficiency is high in leptokurtic, decreases in normal and again 
in platykurtic populations. 

Table V gives the expected values of different estimates of the standard de- 
viation in different populations. By the normal estimate in this table is meant 
that linear estimate of the standard deviation of the given population which is 
best for a normal population (Godwin [4]). Further, Gini’s estimate is that ob- 
tained by using Gini’s mean difference (Nair [9]) which may be expressed as 


A, = 2(2U — (n + 1)V]/n(n — 1) 
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TABLE V 


Expected values of the normal and Gini’s estimates of the standard deviation 
in different populations 


| j 

n= 2 n= 3 n= 4 n= 5 
Population 
| 


Gini’s Normal Gini’s | Normal Gini’s 


Rectangular| 1.02333 | a 1.02333 | 1. | 1.01983 1. 1.01611 | 1 
Triangular. .| 1.01304 | 1.13221 | 1.01304 | 1.13221 1.01213 | 1.13221 | 1.01115 | 1 
Normal } 1. 1. 1. ri 2. oe et. i 
Dbl. expon.| .93999 | 1.06066 | .93999 | 1.06066 | .94296 | 1.06066 | .94612 1. 


' 


13221 


- 

| Normal! Gini’s | Normal 
} 
| 
| 


06066 


TABLE VI 
Percentage efficiencies of the range, normal, and Gini’s estimates of the standard deviation 


in different populations, from ordered samples of size n 


n= 2 n= 3 n=4 


Population 
Nor 


Gini’s Range onal 


onal Gini’s| Range Normal Gini’s Range |Normal) Gini’s 


Rectangular. . 100 100 | 100 97.38) 95.2: 93.45) 88.04 
Triangular 100 100 100 | 99.77) 99.77) 97. 99.02) 98.66) 95.29 
Normal 100 100 98.78 99.72) 96.95 99.48 
Dbl. expon... 100 100 | 100 | 94.75 97.99 99.99) 89.83 96.62 99.75 


where U = >ofuajyy) and V = 3°}, ys. Table V shows that the normal esti- 
mates are biased. Since the normal estimate is the best linear estimate of the 
standard deviation of a normal population, we may expect in general that with a 
platykurtic population the estimate is too high and with a leptokurtic popula- 
tion it is too small. 

The efficiencies of different estimates of the standard deviation relative to the 
best linear estimate are given in Table VI, and a graphical representation in 
Graph 2. The efficiency of the range, as an estimate of the standard deviation, is 
greater in the rectangular population, decreases in the normal and again in the 
double exponential. The efficiency should generally be higher in platykurtic 
populations, decrease in the normal and again in leptokurtic populations. Fur- 
thermore, the efficiency of the normal estimate should generally decrease in 
both platykurtic and leptokurtic populations. 

The efficiency of Gini’s estimate is higher in the double exponential than in 
the normal, and decreases in the triangle and again in the rectangular population. 
In the case of the double exponential, Gini’s estimate is more efficient than either 
the range or the normal estimate, and it is nearly as efficient as the best linear 
estimate, so we can use it as an estimate of standard deviation in samples from 
this population. Its coefficients are very simple and so it can be calculated quickly 
and easily. 

In the normal population, Gini’s estimate is shown to be nearly as efficient as 
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GRAPH (2) 


PERCENTAGE EFFICIENCES OF THE RANGE,GINI'S ESTIMATE, AND NORMAL ESTIMATE 
IN DIFFERENT POPULATIONS 
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the best linear estimate and more efficient than the range. So the estimate G = 
44/7 4; may be used as an estimate of standard deviation for a normal popula- 
tion; it will be considerably more reliable as n increases than that based on the 
range. If the functional form of the population distribution is unknown, we may 
use the best normal estimate or better, Gini’s estimate, as an estimate of the 
standard deviation, because of their high efficiencies. A further advantage of the 
latter estimate is that its expectation is independent of n, so that it can be used 
as an unbiased estimate. 


8. Variation of coefficients in the best linear estimates. Table III gives the 
coefficients in the best linear estimates of different populations. Comparing the 
coefficients in the best linear estimate of the mean, we can see that equal weights 
are given to the sample elements in the case of the normal population, while 
smaller weights are given to the middle elements in samples from a triangular 
population than those given to the extreme elements, and zero coefficients are 
attached to all elements other than the two extremes in samples from a rectangu- 
lar population. 


Again in the case of the double exponential population, the weight is largest 
for the middle sample element, decreases gradually, and becomes least for the 
extreme elements. In general, we expect that the more platykurtic a population 
is, the greater should be the weights given to the extreme elements compared 
with those given to the middle elements. 
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TABLE VII 


Variances of the different estimates of standard deviation in different po pulation (o =1) 


n= 2 n=3 n=4 ne 5 
Popula 
tion 


Nor- 
mal 


Nor N 


Gini’s. Best Range oat Gini’s' Best ‘Range a? Gini'’s Best Range Nor 


Best Range Gini's 
mal 


Rectan 


gular 5000 -5236, .5000 2004 -lill 1167 
Triangu 

lar 5306 .5306 .5445 .6933 . 2415, .2478 . 1514, . 2000. 1099 
Normal 5708 .5708 5708. 2755 -2755| .1801) . 1806. . 1375 
Dbl. ex-, 

pon. -7778 .7778 .6872 .8750 -4321' .3818 .4863 .2086) . -2711, .3314 .2288) .2547 
Expon... 1.0000 1.0000 .78541.0000 .5000 .5555 .4363 .5555 .3333| .4049 .3085, .3899 .2500 .3280 .2404 


(Best denotes the best linear estimate; Normal, the normal estimate; Gini’s, the Gini’s mean difference. ) 


Comparing the coefficients in the best linear estimate of the standard deviation 
of different populations shows that no weight is given to the middle element, as 
is to be expected because the populations are symmetric. Similarly, we see that 
the more platykurtic a population is, the smaller the weights given to the middle 
elements and the larger the weights given to the extreme elements. 


Conclusions. I would like to point out a few problems raised but not solved 
in this paper. 

(i) The reverse problem, that is to find the population or the set of popula- 
tions, if any, for which a given estimate is best linear estimate, is not yet attacked. 
It would be of interest, for example, to know the population whose best linear 
estimate for standard deviation is Gini’s mean difference. 

(ii) Table VII gives the variances of the best linear estimates of the standard 
deviations for the given populations. It shows that the variances of the estimates 
for the rectangular population are the least among the given populations. This 
raises the problem of finding the population whose standard deviation can be 
estimated with the least variance. 

(iii) When general expressions of the expected values, variances and covari- 
ances are known, the best linear estimates from samples of size n can generally 
be obtained. However, in many cases such expressions are not possible, or are 
very difficult to obtain. In these cases we must find them separately for each 
value of n, which will be tedious for large values. It may be possible to find a 
new method or approximation, by means of which we can find the linear estimates 
without these tedious calculations. 

(iv) It has been shown that the coefficients in the best linear estimates vary 
with varying shapes of the distributions, but it seems that this is not the only 
relationship. One could relate the coefficients directly to known properties of 
the distribution functions, such as moments or semi-invariants. However, this 
problem seems to be difficult. 

Finally, I would like to express my acknowledgement to Mr. R. L. Plackett 
for suggesting the problem and for his help during his supervision. I am also 
greatly indebted to the referee for his comments and to Dr. W. Hoeffding, Dr. 
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E. L. Lehmann and Dr. B. G. Greenberg for their kind help in revising the manu- 
script for publication. 


REFERENCES 

{1} A. C. Arrxen, ‘‘On least squares and linear combination of observations,’’ Proc. Roy. 
Soc. Edinburgh, Vol. 55 (1935), pp. 42-48. 

{2} A. T. Crate, ‘‘A note on the best linear estimate,’’ Ann. Math. Stat., Vol. 14 (1943), 
pp. 88-90 

{3} H. Cramtr, Mathematical Methods of Statistics, Princeton University Press, 1946, 
pp.178-179, 372-373. 

[4] H. J. Gopwin, “On the estimation of the dispersion by linear systematic statistics,’ 
Biometrika, Vol. 36 (1949), pp. 92-100. 

5) R. A. Fisuer, “On the mathematical foundation of theoretical statistics,’ Trans. Roy 
Soc. London Series A, Vol. 222 (1921), pp. 309-368 

. A. Fisuer, ‘‘A mathematical examination of the methods of determining the ac- 


” 


’ 


curacy of an observation by the mean error, and by the mean square error, 
Contributions to Mathematical Statistics, 1922, p. 2.769. 

[7] E. H. Luoyp, “Least squares estimation of location and scale parameters using order 
statistics,’ Biometrika, Vol. 39 (1952), pp. 88-95. 

{8} F. Mosre.uer, ‘On some useful inefficient statistics,’’ Ann. Math. Stat. Vol. 17 (1946), 
pp. 377-407. 

{9} U.S. Narr, ‘The standard error of Gini’s mean difference,’’ Biometrika, Vol. 23 (1936), 
pp. 428-434. 

{10} R. L. Puackerr, Lecture notes on the method of least squares. 

{11} 8. 8. Witxs, Mathematical Statistics, Princeton University Press, 1943, pp. 89-92. 





ON MULTIVARIATE DISTRIBUTION THEORY' 


By I. OLkin anp 8. N. Roy 
Michigan State College and University of North Carolina 


1. Summary. This paper is concerned with a matrix method of deriving the 
sampling distributions of a large class of statistics directly from the probability 
law for random samples from a multivariate normal population, that is without 
assuming the Wishart distribution or the distribution of rectangular coordinates. 
Two techniques are proposed for evaluating the Jacobians of certain transforma- 
tions, one based on a theorem on Jacobians [1], and the second based on the intro- 
duction of pseudo or extra variables. This matrix approach has a geometrical 
analog developed in part by one of the authors [2]. Section 3 is concerned with a 
discussion of these two techniques; in Section 4, the former is applied to obtain 
the joint distribution of the rectangular coordinates [3], and in Section 5, the 
second method is applied to obtain the joint distribution of the roots of a determi- 
nantal equation [4], [5], [6], and [7]. 


2. Introduction. Much work on the sampling distributions connected with 
multivariate normal populations is based on the Wishart distribution as the 
starting point, from which analytical or geometrical arguments are applied to 
obtain the desired results. This presupposes that the Wishart distribution is 
available and that it was somehow derived from the probability law for the raw 


observations. When the Wishart distribution is unavailable, as in the case of a 
sample of N observations from a p-variate normal population with p > N — 1, 
other techniques must be applied. 

When considering the roots of the determinantal equation | XX’ — @YY’ | = 0, 
where X and Y are p by n and p by m matrices, respectively, consisting of the 
observations from p-variate normal populations with n < p < m, a lemma may 
be employed [6] to the effect that the nonvanishing roots of | UU’ — 6J,| = 0 
are the roots of | U’U — @J, | = 0, where U is anr by s matrix, r > s. However, 
by starting with the raw observations, the availability of the Wishart distribu- 
tion need not be considered. A discussion of the above examples is given in 
[8] and [9]. The present approach is based on matrix algebra and is proposed as 
an alternative and unified procedure to obtain most multivariate distributions. 

As the shape and properties of the matrices are of importance, the following 
notation is adopted. 

(i) A matrix is denoted by a capital letter, and a column vector by an under- 
lined lower case letter, for example zg’ = (a, -+-- , Xp). 

(ii) X:p XK n means that the matrix X has p rows and n columns. 
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(iii) A triangular matrix with zeros above the main diagonal] is denoted by 
a superior tilde, for example X. 

(iv) A skew-symmetric matrix is denoted by a superior karat, for example X. 

(v) An orthogonal matrix is denoted by a capital Greek letter. 

(vi) The absolute value of the Jacobian, A(yn --- Ypn)/(2n - 
denoted by J(Y; X). 


* Lpn) 18 


3. General procedure. Let X:p X* n be pn random variables following the 
multivariate normal probability law as defined by the density function 


p(X) = (2e)?"?A-"? exp[—} tr(A“XX)] 


where A:p X pis the population covariance matrix, and the means have already 
been integrated out. If X = g(Y) is a one-one transformation in the pn variates, 
the density function of Y is given by 


P(Y) = p(g(Y)) J(X; Y). 


In most cases, we are only interested in the distribution of a subset Z of the pn 
variates, and we may achieve this by integrating out those variates which are 
in Y but not in Z. Thus the problem consists in (a) finding an appropriate trans- 
formation, (b) evaluating the Jacobian of the transformation, and (c) integrating 
out any extraneous variates. This paper is concerned with (b) and (c); the requi- 
site transformations (a) are assumed to be available [8]. 

In the process of evaluating the Jacobian, a difficulty arises whenever the 
transformation involves an orthogonal matrix, and in particular when the ortho- 
gonality is with respect to rows alone, for example [:p X n, Tl’ = /,,p Sn. 
Such a matrix contains pn — p(p + 1)/2 independent variates, and the usual 
procedure for evaluating the Jacobian requires the solution of p(p — 1)/2 equa- 
tions so that the dependent elements may be expressed in terms of the 
independent elements. This is troublesome even if p = 3. The two techniques 
proposed avoid this arduous task. The first makes use of the following theorem 
on Jacobians. 

TuHeoreM 1. /f y; = fda, 2), (6 = 1-+-m), @ 

(tm41°** Lm4n), Where & and & are subject to n constraints f(z, 
(i=m+1,---,m-+n), then 


S(yr +> ym 5 2) = S(fr-+> Iman; 2 £)/I (Sgr +’ Smtn 3 #), 


provided that the numerator and denominator exist and do not vanish {1}. 

The conditions TT’ — J, = 0 constitute the constraints of the theorem. 

The second procedure involves the introduction of pseudo or extra variates 
as follows. Let T; :p X n, p S n, TiT) = J,, and write 


r: = (/,0) ee = (J,0)I, 
Tr. 


where 0:p X n — p, and T,:n — p X nissochosen so that ':n X n is orthogonal. 
If|/ +1 | # 0, there is a one-one correspondence between T and a skew-sym- 
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metric matrix S given by S = (J + ry (J — P) [10]. This S has the desirable 
property that all its elements are independent. The transformation may then 
directly involve S rather than T. 


4. Distribution of rectangular coordinates. The existence of the desired trans- 
formation is given by the following theorem. 

THeoreM 2. Jf X:p XK n, p S n, there exists an orthogonal matrix T:p XK n, 
and a triangular matrix T:p X p with t;; 2 0, (i = 1 +++ p) such that 


(1) X = TT. 


If X is of rank p, then ti; > 0 (4 = 1 +--+ p), and the representation is unique (|. 
We note that X is of rank p with probability 1, and hence the representation 
is unique with probability 1. The elements of 7 are the rectangular coordinates 
[3]. Before proceeding to the evaluation of the Jacobian, we will have occasion 
to use the following lemma. 
LemMaA 3. Let T:p Xn, p Sn, TI’ = 1, , and denote the set of pn — p(p + 1)/2 


independent elements by T,. If no yi; of V is zero, then for each T, there are 2° 
matrices T which can be formed. 


Proor. Without any loss in generality we can let T', consist of 
yij(t=1---pjg=1---n—*?), 


and the dependent set Tp consist of y;;(@ = 1--- p;sj =n —i+1,---,n). The 
matrix T has the form 


Yu Y12 


21 Y22 


¥en «Yet ° °°. Vommp | °°. Ypns Yp.n—1 Yen | 
where I’; consists of all elements to the left of the vertical lines. By the ortho 
gonality conditions on the rows, yi, can take two values +(1 > jot yi;)* 
(¢ = 1--- p). Because the inner product of any pair of row vectors is zero, all 
the other elements in Tp are determined. 

We now obtain the Jacobian of (1) using Theorem 1, where X = Tr 


rr’ — J, = 0 take the place of y; = f; and f; = 0, respectively. Thus 


and 


(2) J(X: 7,17, = J(X, ri’; T, r)/J(rr’; Tp), 


where the right-hand side is expressed in terms of T and 1, . Let 11’ = K:p X p, 
where K is symmetric. We note that K and T are unrelated, and hence the scheme 
of partial derivatives for the numerator of (2) is 


T(p(p + 1)/2) V(pn) 
K(p(p + 1)/2) 0 Mi 


’ 


X (pn) Ma Mo 
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the determinant of which is | Mx | | M»MizMzx |. Write X = (4, --- z,)’, where 
Zi = (ta +++ Gin) and T = (y, --- y,)’, where v1 = (ya -++ Yin). Then 

(4) ai = (ta, -*+, ta, 0,---, ODT, i=1-:-p. 


M2 and My arise from (4), that is d2z;;/dt;; and 02x,;/dy;;, respectively, and 
have the following forms: 


ths oes tes -*° toe. Rees Bees 


"1 maa De his a3 0 0 


Y2 °*° vor 0 0 


0 
0 


v2 
° 
Ty Da, 0 


L2 | Dis, Dis, 


D, 


Tp—1| Digs p—1,2 


a 


Zt. 


where D,,,:n KX n(i = 1--+ p;j = 1--- 2) is a diagonal matrix with diagonal 


elements t;; . From Kiy = yey;(i, j = 1--+ p) we have 


v1 Y2 ots Yo 


ky 2y1 een O- 
0 


kp-1.9| 0 


Hence |Mx| = [[?|D.,,| = | 7 |”. Writing T" = (), G = 1--- 
= 1--- 4), with ¢* = 0 for j > i, MizMx can be shown to be equal to 


j 
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0 BH: bat ne 
0 wf -- 6 


2p 


yit’” yet y2t"” 


= Mal 


\0 
say, where N:p(p + 1)/2 X p(p + 1)/2, and 
< s** 0 oe ; | | 
7; = Le , Ti:(p — i) X (p — 1). 


t'? t' ip » ies f?” 


Thus | MpMizMa| = | MuMa | | 
and|N| = |7 |? []? a7‘ = TI? 


N |. It is easily verified that | My»M,,| = 2” 
t;; . Combining our results we have 


(5) IX; 7,0) = 2 WL aryserr’; 0). 
1 
Consider the multivariate probability law 
(6) p(X) = (29)?"* | A |-"” exp [-} tr (AX X")] dX. 


Using transformation (1) and the Jacobian (5), we have the joint distribution of 
T and T, : 


p(T, T:) = (2x) "| Al” 
-exp [—} tr (A TT’))2” II (tJ (FT; Vp) {dT} {ar}. 
1 


To obtain the distribution of 7 alone, we must integrate out the variables of 
I’, over the domain rl’ = J, that is we must evaluate 


(8) L(n, p) = [ rer; lp) aT,, 
2 


where 2:T°T’ = J. Since the integral is unity over the space of T:-—« < 
tig < © (i #7),0 < tis < ©, we obtain 


(9) p(T) = | Al? TL et! exp (—3 tr (A PP) af, 
1 


where ¢ = 277"??? ”-Y/4T Tern — 4 + 1)/2. Incidentally, L(n, p) can now 
easily be evaluated, namely, 


. 
(10) L(n, p) = gears II rm '(n — i + 1)/2. 
1 


With respect to the integral (8), it should be noted that if the probability density 
involves I’, it will be necessary to add up the probability densities for the 2” 
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points in T (subject to TT’ = J) which correspond to a particular point of 
I’, , by virtue of Theorem 3. If the probability density is free of [ a factor of 


2” is sufficient. 


5. Distribution of the roots of a certain determinantal equation. The existence 
of the requisite transformation is given by the following theorem. 

TuroreM 4. [f X:p K n(p S n) and Y:p XK m(p S m) are of rank r S p 
and p, respectively, then there exist matrices Z:p XK p, Deir Kr, Titp Kn, Aivip Kn 
such that 


(1) vs ao \T, a 
Q s/ 


where Z is nonsingular, Dg is diagonal with elements 0; 2 0 2 --- 2 6, > O, 


the 0(i = 1 +--+ r) are the nonvanishing roots of | XX’ — OYY’| = 0, and T, and 
A, are row orthogonal. If in addition (a) X is of rank p, (b) the roots are distinct, 
and (c) the elements (Z\; ++ Zip) of Z are positive, the representation is unique [8}. 

We note that (a) and (b) are satisfied with probability 1, and hence we need 
only guarantee (c). Assuming (a) and (b) we rewrite (1) and introduce pseudo 
variates. 


(2) X =Z(DO)T, Y = Z0)s 
T, = 1,, 


7 inn’ ‘ , rr. “ SS, 
where I’ = (T\T2):n * n and A’ = (A,A9):m XK m, are orthogonal. S = 


SS, 


is a skew-symmetric matrix (S\:p X p, Sain — p X n — p) related to T by 
S = (1+ 1r)"U — LP), provided the inverse exists; 7 is similarly related to A. 
We note that the left-hand side contains 
pn + pm + (n — p)(n — p — 1)/2 + (m — p)(m — p — 1)/2 

=p +p+n(n — 1)/2 + m(m — 1)/2 


variates as does the right-hand side. We now proceed to obtain the Jacobian 
of (2). Familiarity with the techniques in [11] is assumed. In particular, the 
Jacobian of a transformation is equal to the Jacobian of the transformation in 
the differentials. Taking differentials in (2), we have 


(3a) (dX) (dZ)(DO)T + Z(Dw)T + Z(DO)(aT), 
(3b) (dV) = (dZ)\(I0)A + Z(I0)(dA), 

(3c) (dS,) = (dS,), 

(3d) (dT,) = (aT). 


Pre- and post-multiply (3a) by Dy'Z™ and I’, respectively, and (3b) by Z™' 
and A’, respectively 





MULTIVARIATE DISTRIBUTION THEORY 


(4a) Dy'Z'(dX)T’ = Dy'Z'(dZ)(D&) + Dy'(DwW) + (O)(dryr’, 
(4b) Z'(dY)A’ = Z'(dZ)(10) + (10)(dA)d’, 


From d(Tl’) = (d1)T’ + P(dP’) = 0, it follows that (d1)T’ is skew-symmetric, 
and hence we can let 


A = (dY)r’ = —2(. + 8) (dS) — Sy", 


B = (dA)’ = —201 + TY (aT) — TP)". 


U Dy'Z'(dX)U", V =Z'(dY)a’, 
W = Z'(dZ), D, = Ds'Dw . 


(5c) 


Transformation (4) becomes easily 


(6a) U = Dy'WD,(10) + D0) + UO)A, 


(6b) V = W(0) + (0)B 
(6c) (dS,) = A(T + 8) AI — 8,) 4 
(6d) (dT) = $([ a T) Ba > T,) + 


where the additional terms in (6c) and (6d) are independent of A, and B, , 
respectively. The Jacobian can be written as follows: 


J(X, Y,8,, 7; Z, 0, 8, 7) = J(dX, dY, dS, , dT, ; dZ, do, dS, dT) 
(7) = J(dX,dY,dS,,dT,;U, V, Ay, B)J(U, V, Ay, Ba ; W, n, A, B) 
J(W, n, A, B; dZ, d0, dS, dT) = JyJoJd 
We now evaluate the components using [11]. /; arises from (5): 
J; = J(dX; U)J(dY; V)I(dS, ; Add (AT, ; By) 
“ (Dl |Z "Ir pPqZzi"{alPy2 
1+ ?,|"?") 


For simplicity the last two terms will be denoted by a(S,, n p) and 
a(T,, m — p), respectively. J, arises from (6) yielding the scheme of partial de 
rivatives shown in Figure 1, where J, = J (,—»)in—p—»/2, Le = Tten—p)(m—p—i)/2 , and 
all other identity matrices J are p(p — 1)/2. The determinant of the matrix 
given in Figure 1 is equal to 
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D, arises from that part of (5a) which contains u;; = (6;/6;)z;; + terms inde- 
pendent of z;;. du;;/dz;; = 6,0; and hence D,:p(p — 1)/2 X p(p — 1)/2 is 
diagonal with elements 

29; 4, % 


6,’ 0,’ "01° O.’ 


Since D, = Dj’, 


6; 8, 2 \ TT -o-0) 
(9) J, = TI (2 ~ } -o- 0) er. 


i<j} 
Finally, 
po J, = J(W; dZ)J(n; d0)J(A; dS)J(B; aT) 
= (|Z|-”)(| De |)a“(S, n)a"(T, m). 


Combining our results (8) to (10): 


Pp 
(11) J = [Lor IT @ — 6) |Z” g(S)nc7), 
1 i<j 
where g(S) and h(7’) are functions of S and 7, respectively. 
Consider p(n + m) random variables X* and Y* following the multivariate 
normal probability law as defined by the density function 


(12) p(X*, ¥*) = cexp[—} tr (ATX*X” + BUY*Y*)], 
wherec, = (24) -?"*"? | A [-"? | BI-™? X*:p & n(p Sn), Y*:p X m(p S m), 


A:p X p, and B:p X pare the population covariance matrices and are positive 
definite. The roots of | X*X* — @Y*Y*’| = Oare invariant under a nonsingular 
transformation X* = LX, ¥* = LY, where X:p XK n, Y:p XK m, and L:p X p 
is nonsingular and is chosen so that LAL’ = D,, LBL’ = I, andp,,--- ,p» > 0 
are the population roots of | A — pB| = 0. The Jacobian J/(X*, Y*; X, Y) of 
the transformation is | L |~‘"*”"’, and | A| = |D,||L\?,|B | = |L|*, which 
yields the density function of X and Y: 


(13) p(X, Y) = c,exp(—} tr (D,XX’ + YY’)], 


where c, = (2r)""*™” | D, |-"®. We now make use of the main transforma- 
tion (2) and the Jacobian (11) which leads to the density function of Z, 8, 
8,7: 


p(Z, 6,8, 1) = TL or TI @? — 0) | zy" 
1 


i<j 


exp [—} tr (D, ZDe Z' + 22’), 


p(n+m) /2 


where c; = (24)7 | D, \~"?9(S)h(1). We now integrate out the extraneous 
variates to obtain the distribution of 6, --- 6, . The integrations will be indicated 
for the null case, that is when A = B or equivalently D, = J. One of the inte- 
grations involved is based on the following lemma. 
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LEMMA. 


I(r, K, A) | WwW’ |* exp [—} tr (AWW] {dW} 


where Wir XK r; Atr XK r is positive definite, and Q:-0e < w 
(2, ) i a F 


Pi 


Proor. Since A is positive definite we may write A = 0’0 and make the 
successive transformations X = 77, tT = (2 + S)'U — 8) and V = UT. The 
Jacobians [11] are 


pv p 
J(X; T, 8) = [I 7? reer irT4 § a StF: F) = I] MiG 
1 i 
Hence 


I(r, k,a) = [ I] ui vis exp [—43 tr VV’) {dV} 
“2; 1 


+ S " {dS}, 


where Q):-—*% < vj < * (i # j), 0 < vy < &, and Q is the total space of 
r(y 1)/2 dimensions. The first integral is a product of gamma functions and 
the second integral is evaluated in [12] and [13], giving 


I(r, k, a) = | Al TI|2™ “3 (Strate Cees 
1 = 


xX (27) 


The integral over the Z space is thus 2 “J(p, (n + m — p)/2, Dgzui), where 
the 2°” arises from the restriction (2), --+ 21p)) > O. The constants arising from 
integrating over S and 7 can be obtained directly using [12] and [13], taking ac- 
count of the introduction of the pseudo-variates, or indirectly from the distri- 
bution of rectangular coordinates. They turn out to be 2”L(n, p) and 2°L(m, p) 
of the previous section. Combining all results we obtain: 


Pp 
pe) k I] 6; ’ + 6;) ry enn I] (6; ~ 6°), 
1 cj 


where 


pa verry r(ste— st) 


/o(e=et)e( 
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noting that 


(24) my gerpeee + 


“i aa " ie 4 asia a 
x I] r( : - Pp + cs (2 3H) [2”L(n, p)|[2”L(m, p)] 
1 


2 ) 9 


= k. 


If we let 6g =a ort = B:/(1 + 8,), we may obtain other familiar forms for the 
density function of the roots of related determinantal equations. 


Added in proof. A. T. James, “Normal multivariate analysis and the orthog- 
onal group’’, Ann. Math. Stat., Vol. 25 (1954), pp. 40-75, obtained a similar re- 
sult as an application of Grassman and Stiefel manifolds. 


6. Conclusion. In the present paper the methods proposed have been illus- 
trated by giving a new derivation of the joint distribution of rectangular co- 
ordinates and the joint distribution of the roots of a certain determinantal 
equation. These methods also apply to other situations, for example the singular 
case in the above examples, the joint distribution of canonical correlations; 
multiple and partial correlations; inverse and adjoint of certain matrices. In 
essence, the problem consists in obtaining the requisite transformation which 
will lead to the desired variates, and then applying the proposed techniques to 
evaluate the Jacobian and integrate out any extraneous variates. 
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ON THE COMPUTATION OF THE SAMPLING CHARACTERISTICS OF 
A GENERAL CLASS OF SEQUENTIAL DECISION PROBLEMS 


By G. E. ALBEertT 


University of Tennessee 


Summary. The connection between the theory of random walks and Wald’s 
theory [10], [11] of sequential probability ratio tests of hypotheses has been :e- 
marked by several authors. In particular, Kemperman [5] has exploited that con- 
nection to obtain integral equations for the determination of the decision prob- 
abilities and the expected sample size of a Wald sequential test. It is the purpose 
of the present paper (1) to generalize Kemperman’s integral equations to apply 
to a fairly extensive class of sequential multiple decision problems, and (2) to 
indicate methods of obtaining practical results from such integral equations. 

Part I of the paper is purely theoretical. It presents the integral equations al- 
ready mentioned and generalizes a method of obtaining upper and lower bounds 
for their solutions that seems to have been first published by Kemperman [5] 
and Snyder [7] simultaneously. 

In Part II the possibilities for application of the general theory are illustrated 
by a discussion of Wald’s sequential tests for simple alternatives on the parameter 
of a distribution, under the hypothesis that a sufficient statistic for that parameter 
exists. In particular, it is shown that the Kemperman-Snyder method for ob- 
taining bounds for the solutions of the integral equations may be used to obtain 
substantial improvements over the bounds given by Wald for the operating char- 
acteristics of the test for simple alternatives on the mean of a normal distribution. 
Methods of numerical analysis are indicated that might be useful in a well- 
equipped computing laboratory for further improvement of the bounds. 

It is clear from the results obtained here that the methods used, coupled with 
extensive numerical work, should yield definitive improvements over Wald’s 
approximate methods for setting the decision boundaries and estimating the 
sample size moments for sequential tests. It is hoped that the decision rule 
adopted in Part I is sufficiently general that the theory will provide a useful tool 
in the design and study of multiple decision problems. 


Part I. THEeory 


1. Random walks and decision problems. Let ? be an abstract space of points 
x. For each fixed x in R let P(A | x) denote a conditional probability measure 
defined on a Borel field $ of subsets A of R. It will be assumed that, for each set 
A of the field $, P(A | xz) is a Borel measurable function of z. 

Let d;,i = 1, 2, --- , r, denote a set of r distinct decisions, one of which is to 
be made about the probability function P(A | x) as a result of a sequentially- 
performed experiment as described below. The symbol dy will denote the deci- 
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sion to continue experimentation at any step of the sequence instead of making 
one of the terminal decisions. Wald [12] gives a more detailed description of this 
type of decision problem. Define r nonnegative measurable functions 7;(7), 
i = 1,2,---,7r, on R with the property }>%_1 x(x) < 1 for each z in R and let 
mo(z) = 1 — Doin wz). 

The experiment takes the form of a random walk described as follows. Let x» 
be an arbitrary point in R. Make one and only one of the decisions d; with 
respective probabilities +,(a), i = 0, 1, 2,---, r. If the decision made is d, , 
i 2 1, the experiment is terminated; if it is dy , a point x; is drawn from R using 
the distribution specified by P(A | 2). Again make one and only one of the de- 
cisions d; ,i = 0, 1, 2, --- , r, with respective probabilities ,(z,). If the decision 
made is d;,7 2 1, the experiment is terminated; if it is dy , a point 2, is drawn 
from R using the distribution specified by P(A | z;) and so on until one of the 
desired decisions d;, i = 1, 2, --- , r, is made at a point z,,n = 0, 1, 2,--- 
In order to guarantee that the duration n of the experiment be finite with prob- 
ability one, the following assumption will be made. 

AssuMPTION 1. There is a constant p,0 S p < 1, and an integer M such 
that for all a in R and all m = M the inequality 


(1) i tee [ it ated | P (dem | 2m—1) P (dem | tm-2) ++: P(dex | x) Sp 
RK ae 


t=] 
is satisfied. 

The notation in (1) is similar in most respects to that used by Doob [1]. The 
integral is to be interpreted as an iterated Lebesgue-Stieltjes integral whenever 
such exists. Doob’s paper gives further discussion. A subscript on the symbol de 
denotes the variable of integration. 

Let pix(t), 7 = 1,2,---,r,k = 0,1, 2, --- , denote the probability that the 
terminal decision d; is made at the stage n = k if x is the arbitrary starting point 
of the experiment. Evidently, for each i = 1, 2,--- , 1, 


/ 
{ 


| pio(to) = wi (x0), 


. k 
| Pi e413 (Xo) = / ere | II ns(v) | mi (Xe41) P (degss | Lx) 
| R R 


(2) 4 oT 


-P(de, | Le-1) eee P (de, | Lo) 


| = ae / pin(y) P(dey | 20) kz>0. 
R 


The probability that the experiment be terminated at the stage n = k, re- 
gardless of which one of the terminal decisions is made, is given by 


(3) P(X) = a Piz (Xo). 
t=} 


It is obvious that po(%) = 1 — mo(x%o) and that the rest of the functions (3) 
satisfy the same recurrence relations (2) relative to k as do the functions py(z). 
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That the duration » of the experiment is a random variable on the range n = 
0, 1, 2, --- will follow from 


wo 


(4) lim Pn >k) = lim D> p(x) = 0. 
kan km 7 wk+1 
To establish (4), note that by (2), for any j 2 1, 


1 


+ » 2 
pitta) = Jo J | IT noe Jo ~ mo(x;)] Pld 
R 0 


Thus, for any integer k, 


7. p; (Xo) =l- [|x mo (x,) )|P (de, | Xp-a) -°° P (de, Xo). 
jJewd 


Let N be any integer and suppose that k 2 NM, where M is as specified in As- 
sumption 1. Clearly the integral on the ight i in (5) is dominated by the quantity 
ao(xo)p’ and the limit (4) follows 


2. Some integral equations. Let q,(x), k = 0, 1, 2, --- be any sequence of 
functions defined and Borel-measurable on R which satisfy, on R, the conditions 


( Os q(x) SK < @, 


(6) 
Ge4i(t) = mola) | qe \y) P (de, x : 2 GO. 
+R 
The termination probabilities py. and p, defined in the preceding section are 
admissible qy . 
THEOREM 1. If the se quence q (x), k = 0,1,2,--- satisfic s (6) and \ is any com 
plex number for which 


vi 


(7) p| Xr <'S) 


where p and M are the constants specified in Assumption 1, then the series 


x 


(8) > r’q;(x) 


converges uniformly and absolutely for x in R to the bounded solution u(x; 


’ 


A) of the 
integral equation 


(9) ula; Xr) G(x) + Amor | u(y; ¥) P(de, | x). 
I 


Proor. The convergence properties of the series (8) are easily established. 
Using (6) repeatedly, for any 2 in R one has 


[ [ it r(x | Qo\ Xj 41) P (de se | By eee P (de, Yo). 
JR | iad } 


“R 


3y Assumption 1, if 7 2 NM, one has 


| x?" qi4s(to) | S Kp* |X 
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. 5 +1 , a . . . 
so that the series }° 7m | \’*'gj41(ao) | is termwise dominated by the series of 
constants 


ae N+1) M—1 M 0 
Ke 2D jaa KD Ial LD lal 
N=l p=<N\M j=i Nel 
which converges by (7). That the series (8) satisfies the integral equation (9) 
is an immediate consequence of the uniformity of convergence and the relation 
(6); for, 


0 0 
qo(xo) + >. déqi(a0) = goa) + dao(ao) | >, r\**gi-1(y) P(dey | 20). 
j=l R j=l 
Uniqueness of the solution u(x; A) of (9) follows almost exactly as in the proof 
given by Wasow [14], pp. 201-202. 

There are several important applications of Theorem 1. First, if the termina- 
tion probabilities are chosen as the q,(z), the series (8) becomes the generating 
function of that sequence. Setting \ = exp(z), one obtains the moment generat 
ing function of the distribution of the duration n. 

Coro.uary 1. The moment generating function of the distribution of the duration 
n exists for all values of its argument z such that p| exp(z) |“ 


< 1 and, when con- 
sidered as a function of the starting point xo of the experiment, satisfies the integral 
equation (9) with X = exp(z), g(x) = 1 — m(x), anda = x%. 

As a consequence of (4), the probability is unity that one of the decisions 
d;,i = 1, 2,---, 7, will be made. If the arbitrary starting point of the experi 
ment is 2% , the probability P;(2») of making the ith decision is given by P(x) - 
> io pie(to), i = 1, 2, --- , r. Since the functions (2) are admissible q in Theo- 
rem 1, and A = | is allowed by (7), one has a second application of the theorem. 

Coro.uary 2. The probability P (ao) of making the decision d; , 1 4 
r, at some finite stage of the experiment satisfies the integral equation 


(10) P (a) = wilxo) + mo(2xo) [ Pw P(de, | xo). 
oR 


It follows from Corollary 1 and the assumption p < 1 that the moment 
generating function of the duration n exists for positive real values of its argu- 
ment z. It follows that all of the moments of the distribution of the duration 
exist. Let M,(a») denote the kth moment. Formal differentiation of the moment 
generating function of n leads to the third result. 

Coro.Luary 3. The kth moment M,(x») of the distribution of the duration n of an 
experiment that begins at x» satisfies the equation 


k—1 


/ 
M20) = mo0(2o) > ( ) [ M,(y) P(de, | 2) + oxo) [ M,(y) P(de, | x»). 
Ss JR JR 


san () 


In particular, the first two moments satisfy the equation (9) of Theorem | 
with A = 1, 2 = a, and q@(a) = mo(%) for the first moment, and with g(a») = 
2M (a) — w(x) for the second moment. 
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Wasow [14] established Theorem 1 and Corollary 3 in a special case using a 
similar proof. Kemperman [5] gave Corollaries 2 and 3 for cumulative sums on 
the real line. He assumed fixed decision boundaries for each step. 


3. Approximate solutions. In many of the important applications of the 
integral equations of the last section, the equations are likely to be very difficult 
if not impossible to solve by methods that will give useful numerical results. It 
may be necessary to resort to approximate solutions. To this end, the following 
results are offered. 

TuHeoreM 2. Let P(A |x), k = 0, 1, 2, --- be a sequence of probability func- 
tions of the type defined in Section 1 which satisfy Assumption 1 with constants 
pand M that are independent of k. Let qo(x) be nonnegative and bounded on R, and 
suppose that \ is any complex constant satisfying (7). Denote by uz(x; ), k = 0, 1, 
2, +++ the solutions of the integral equations obtained by replacing P in (9) by Py. 

If, for the sequence of functions 


Su(x) = dmo(x) { [ uo(y; 4) Po(dey | x) — [ ua(y; A) Px(dey | x)>, 
R R 


/ 
k = 0,1, 2, ++ 


it is true that lim,.. f,(x) = 0 is satisfied uniformly for x in R, then lim,  «ux(x; d) 
= u(x; A) uniformly for x in R. 

Proor. The sequence of difference functions w(x; A) = u(x; 4) — ue(z; A), 
k = 1, 2, 3, --- satisfies the sequence of integral equations 


(11) we(x; A) = fala) + Awo(x) | we(y; A) P,(de, | x). 
R 


By iteration 


= = ) +b mlx SW pes . 
we(x; +) = fe(x) + woz) 2» [ | | 


N. 


a 


I w(x | Sx (aw) 


-P, (dey | Inw—-1) oe P, (de; | x). 


Let « denote the least upper bound of | f,(z) | on R. Splitting the series at N = 
M and treating the sum from N = 1 to N = M by an obvious method and the 
remaining sum as in the convergence proof in Theorem 1, one obtains 


(N+1) M—1 


( M 20 
loi saqi+ Sire yer de part, 
N= N=l j=N ) 
The series in brackets is convergent and independent of x and k. Since lim,_. .¢ = 
0, the theorem is proved. . 

The remaining results of this section extend and exploit a method published 
simultaneously by Kemperman [5] and Snyder [7]. For each point z in a space 
B let G(A | x) be a measure function defined on a field & of Borel subsets of 
B, and suppose that for each A in &’, G(A | x) is a measurable function of z. 
It will be assumed that for some constant p’,0 < p’ < 1, and integer M’, 
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(12) / vee [ G(dey | xo) G(dez | 2) «+++ (dey | tu’) Sp’ all z in B. 
B B 


Let f(x) be measurable and 0 Ss f(x) s K’ < @ on B, and suppose that w(x) 
is the solution of the integral equation 


(13) w(x) = f(z) + | w(y) G(de, | 2). 


Derinition. A nonnegative function h(x) will be called an upper (lower) 
function for w(x) if its iterate 


(14) h(x) = fl) + | h(y) G(de, | 2) 
B 
is less than or equal (greater than or equal) to h(x) for all x in B. 
Let ¢,4(x) denote the characteristic function of the subset A of B; g4(x) has 
the value one if x is in A and zero if z is not in A. For sets A of the Borel field 


§’, define the functions G,(A |2) = G(A|z) and 


GA | x) aa [ 77 [ Pa(2s) G(de, | Lei) *°° G(de, | x), . = 2, 3, 4, os 


s—l 
also, define the functions f,(z) = f(z) + >> | fly) Gilde, | x), 
k=l /B 


¢=1,2,3,---. 


By iteration of (13), w(x) also satisfies 
(15) w(x) = f(x) + i w(y) G,(de, | x), 
B 


Repeated iterations of a function h(x) by the operator on the right in (14) define 
the functions 


(16) h(x) = f,(x) + / h(y) G,(de, | x), s= 1,2,3,--- 
B 


of which h, is a special case. 

THeoreM 3. Jf h(x) is an upper (lower) function for the solution w(x) of (13), 
each of the functions h,(x), s = 1, 2, 3, --- , given by (16) is an upper (lower) 
funetion for w(x). 

The proof is evident. The fundamental utility of upper and lower functions is 
expressed by 

THEOREM 4. An upper (lower) function for the solution w(x) of (13) is an upper 
(lower) bound for w(x) on B. 

Proor. Consider an upper function for w(x). (The proof is similar for the case 
of a lower function.) Let U be the least upper bound of w(x) — h(x) on B and 
assume that h(x) is not an upper function. Then U > 0 and for an arbitrary 
positive number e there is a point 2’ in B for which w{z’) — h(a’) > U — «. By 
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Theorem 3,h,(z) S h(x), s 2 1,80 w(z’) — hy’ (x’) > U — e, where M’ is the 
integer used in (12). By subtraction of (16) from (15) with s = M’, 


[ [wly) — h(y)| Gu(de, | 2’) > U — «. 
“B 

By ('”) and the definition of U, Up’ > U — e; thus U < ¢/(1 — 9’). Since « 
was arvitrary, the assumption U > 0 has been contradicted. 

The following results may be useful in connection with Theorem 3 for the 
determination of upper and lower functions. 

‘THEOREM 5. 

(i) If (12) is satisfied for M’ = 1, the least wpper bound and greatest lower bound 


of the function f(x)/{1 — | G(de, | x)| over B are, respectively, the smallest con- 


B 
stant upper function and largest constant lower function for (13). 

(ii) Let G*(A | x) and f*(x) be functions of the types G(A | x) and f(x). If the 

solution w*(x) of the integral equation w*(x) = f*(x) + | w*(y) G*(de, | x) is 
~“B 

nonnegative, it is an upper function for (13) if f*(x~) 2 f(x) and G*(A |x) 2 

G(A | x) for all x in B and all sets A in &'. It is a lower function for (13) if these 

inequalities are reversed. 

(iii) Let h(x) be an upper function for (13) and hy(x) its iterate (14) and suppose 
that u(x) is any measurable function satisfying [hy(x)/h(x)| S u(x) S 1. Choosing 
f*(x) = f(x) and G* (de, | x) = u(y) G(de, | x) in (it), w*(x) is a lower function for 
(13) and the average v(x) = 43{hy(x) + w*(ax)] is an upper function for (13) having 
the property v(x) — w(x) S 3{h(x) — w(a)]. 

Proor. The proofs of (i), (ii) and the first part of (iii) are trivial and are 
omitted. To see that v(x) defined in (iii) is an upper function for (13), it must be 
shown that its iterate »,(a2) by the operator in (14) is dominated by v(x) over B. 
This will be so if 


i [hy(y) + w* (y)} (i(de, |x) S | [h(y) + w* (y)uly)| G(de | BPs 
B Jp 
Now w*(y) S h(y) so 


| w"(y)[1 — uly)] G(de, |x) S oe — u(y)] G(de, | x) 
“B B 
s | h(y){1 — hyly)/h(y)) G(de, | x). 
JB 


The desired result follows at once from this inequality. 
In cases where the integral equation (13) takes the simple form 


b 
(17) w(x) = f(x) + / w(y) K(x, y) dy 
va 
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with a continuous kernel A(x, y), another device for improving upper and lower 
functions is available. Let R(x, y) be the kernel that is reciprocal to K(a, y). 
That is, for each x in (a, b), R(x, y) satisfies the equation 


b 
R(x, y) = K(a, y) + K(a, z) R(z, y) dz. 


Let H(x) denote the difference h(x) — h,(x) between a function h(x) and its 
iterate h(a) by the operator on the right in (17). 

THeoreM 6. If h(x) is an upper (lower| function for the solution w(x) of (17) 
and if, for each x in (a, b), the function J(x, y) is a lower |wpper| function for 
R(x, y), then the function H,(x) defined by 


b 
Hy(x) = h(x) — H(x) - | Hy) J(x, y) dy 


is an upper \lower| function for w(x) and Hi(x) Ss h(x) [Hi(x) 2 h(x)). 
Proor. The proof will be indicated for the first case. The iterate H.(x) of 
H,(x) by the operator on the right in (17) is easily shown to be given by 


H.(x) = H,(x) 


+ | H(y) 


\ 


ob 
<J (a, y) — R(x, y) + | K(a, z)[R(z, y) — J(z, y)] dz> dy. 


The function J/(a2, y) is a lower function for R(x, y) so that the bracketed ex- 
pression in the integrand above is nonpositive. Thus H,(z) S H,(2). 

The reader will recognize a similarity between the conditions imposed upon 
the measure function G(A | x) and those imposed earlier upon the probability 
measure P(A | x). The definitions and results on upper and lower functions may 
be rephrased in such a way as to apply directly to the integral equation (9) and 
its special cases, if the condition (7) is replaced by the requirement that » be a 
positive real number for which ps“*’ < 1. This may be useful in some problems. 
The applications of upper and lower functions to be made in the remainder of 
this paper are of a slightly different character and the discussion has been phrased 
in terms that are suitable for those applications. 


Part Il. Some ILLusTrRATIvVE APPLICATIONS 


4. A sequential probability ratio test. Wald’s sequential tests of hypotheses 
{10}, [11] are based in part upon his theory of cumulative sums of independent, 
identically distributed random variables [9]. As Kemperman [5] has shown, an 
alternative treatment is available in terms of the integral equations given in 
Part I above. The remainder of this paper will be given to a study of the integral 
equations for the risk probabilities and the expected sample size of the sequen- 
tial probability ratio test for simple alternatives on the parameter of a distribu- 
tion. It will be assumed that a sufficient statistic for the parameter exists. 

Let g(u; 8) be a probability density function on the real line of the form 
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g(u; 6) = exp {p(6@)k(u) + r(u) + q(6)} in which k(u) and p(@) are monotone 
increasing functions of the variable u and the parameter @ respectively. The 
sequential probability ratio test for the hypothesis @ = 6, against the alterna- 
tive 6 = 6, will be considered at some length. (Girshick [3] treated this prob- 
lem by another method.) 

Write § = p(@) and v = k(u). By the monotoneity assumptions on these 
functions, they have single-valued inverses 6 = P(é) and u = K(v). Define 
Q(é) = g[P(é)] and R(v) = r[K(v)]. The probability density function for the 
variable v is 


(18) flv; —) = K’(v) exp {& + RQ) + Q@)}. 


The test may be stated in terms of £; = p(6:) versus & = p(@:),& > &. 
The logarithm of the probability ratio is 


z = log [g(u; 02)/g(u; 0:)} = 2(v + r)A 
2A = & — f and r = [Q(&) — Q(é)]/2A. 


The cumulative distribution of z is 


2/24 
F(z; §) = | ft — 78) dt. 


It will be assumed that a positive constant 6 exists such that at least one of the 
inequalities F(—6; &) > 0 or 1 — F(6; &) > 0 is satisfied. 

To apply the general theory of sections 2 and 3 above to the Wald test of 
& versus & , proceed as follows. There are two terminal decisions, d;:§ = & 
and d,:& = & . Choose two positive constants a and b and let the decision prob- 
ability functions 7;(z), i = 1, 2, be the characteristic functions of the sets x S 
—b and x & a respectively. The complementary probability mo(x) is the charac- 
teristic function of the interval —b < x < a. 

The random walk begins at an arbitrarily chosen real number z (Wald’s 
specialization to 2. = 0 will be made later). The successive points zy , N = 0, 1, 
2, --- of the walk are the cumulative sums zy = Dees z; of a sequence {z;} of 
independent values of the probability ratio z. By Corollaries 2 and 3, if ~ is the 
true value of the parameter, the probability P;(xzo ; &) of making the decision 
d;,i = 1, 2, and the expected duration M,(x» ; £) for a test that begins at z = 
2X satisfy the integral equations: 

(1 


| —b 
} fy —~— XH Sines 
[1G - se) 


(19) 2P (a; A = § r 


+ [ Py(y;8)f (43 a =) dy —b<% <a, 
—b ad 


% = 4; 
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( 0 
~ 0 : y — 2 
[s(t — v8) 
- = 2A 
(20) 2P2(xo; e)A 


| + . P2ly;)f C3 - rt) dy 


\ 1 Zo 


0 tw Ss —bor x2 
(21) 2My(a;t)A = ¢ a 
| 2a + / Mi(y;8)f (: 

—b 


The risk probabilities for Wald’s test of 6, against 6, are as follows. The prob- 
ability of rejecting 6 when it is correct is the value P,(0; &) of the solution of 
(19). The probability of rejecting 6, when it is correct is the value P,(0; £,) of 
the solution of (20). Some consideration will be given later to the approximate 
evaluation of these risks and the expected sample size M,(0; £). 

The dispersion of the sample size could be studied from its variance. By Corol- 
lary 3 the second moment of the distribution of the sample size satisfies the equa- 
tion obtained by replacing the leading term 2A on the right in (21) by 


2A(2Mi (a ; €) — 1). 


x ~ “t) dy -b< 2% 


It might be better to study the dispersion of n directly from its distribution. 
If x and ¢ have the same meanings as above and if k is an arbitrary integer, 
P(n & k; %,) = 2 pw(%» ; £), where the terms on the right are (3) with 
their dependence upon £ put into evidence. Applying the recurrence relations (2), 


(22) P(n & k; xo, &) = pela; &) 


Y — Xo 


2 


-- “i) dy —b<m<a. 


+ (2a) [, Pin =k; y, os ( 


The initial term on the right in (22) is given by the iterated integral 


a a —b o 
mind = on [a + [Eb ~ nn 


Since the normal distribution has the form (18) with the mean as the param- 
eter, the results to be obtained are applicable to that case. The binomial and 
Poisson probability functions also have the form (18). Thus, with proper care 
in interpretation of integrals as sums, the results may also be used for tests of 
hypotheses on the proportion p of the binomial distribution or the mean of the 
Poisson distribution. In the latter connection see Herbach [4]. 


5. Monotone character of the risk probabilities. It is important to note condi- 
tions under which, for each fixed x in the interval (—b, a), the probabilities 
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P(x» ; &) and P2(a» ; —) are respectively monotone decreasing and monotone in- 
creasing functions of the parameter £. Consider (19). Substituting ¢ = 1/2A and 
to = 2/24 and using the identity 


S(t —t— r;t 


» §) 
= f(t — to — 7; &) exp {(€ — &) (t — te — 7) + Q(E) — Q(&)} 
to display dependence upon £, iteration of (19) gives the series solution 


P,(2t A; £) 


6/24 
= / f(t — to — r;&) exp [(E — &)(t — tp — 7) + Q(E) — Q(E)} de 


a al24 a/24 ~b/24 -N+1 7 
7: / dt, ++: | dty [ ‘TT f(ti -ti- rif) | 
Nel o--b/24 -b/2 0 L inl 


- exp {(& — &)[twa1 — lo — (N + I)r] 


+ (N + 1)(Q(é) - Q(£2)]} dtw 4. 


Differentiating (23) with respect to — termwise under the integral signs has the 
effect of inserting the quantities {tv,, — & + (N + 1) [Q’(é) — 1r]} as factors 
in the respective integrands of the terms of (23). Since the ranges of f) and ty4, 
make the difference fyi; — t& negative, it is clear that the condition 


(24) Q’(—) s 0 for all & 


is sufficient to make P;(x» ; £) monotone decreasing in & in the range § 2 &, . 
Operating similarly with (20), the condition (24) is also sufficient for P(x» ; &) 
to be monotone increasing in é in the range — S &, . It is assumed that the formal 
manipulations can be justified. 

These monotone properties of the exact risk functions are important in the 
extension of the validity of the test of the simple alternatives £, and & to com- 
posite alternatives § S & and & = &. 


6. Bounds for the solutions of (19), (20) and (21). Convergent series solutions 
such as (23) are available for all of the integral equations associated with the 
sequential test described in Section 4. They appear to be useless in general for 
computational purposes. The possibilities for applying the upper and lower func- 
tion concepts discussed in Section 3 will be illustrated here by a particularly 
simple analysis that yields bounds for the solutions of (19), (20) and (21). 
Specializations are given in the next section to the test of a simple hypothesis 
and alternative on the mean of a normal distribution with known variance. A 
comparison with Wald’s bounds [10] and [11] on the risk probabilities and ex- 
pected sample size of the test is given there. Kemperman [5] obtained much 
weaker results of the same type under weaker hypotheses on F(z; £). His methods 
were similar to those to be used here. The potentialities of the theory for further 
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work should be evident. To obtain bounds for the risk probabilities and expected 
sample size that are substantially better than those given below is clearly pos- 
sible, but involves extensive numerical work. 

Consider the problem of finding upper and lower functions for the solution of 
the integral equation (19) when — = & . Iteration of a constant by the operator 
on the right in (19) is simple. Also, since 


(25) ry (2 — 554.) = (2 - a), 
a s( a 


iteration of the function e* is simple. Upper and lower functions for P;(z; £») 
will be obtained from linear combinations of the pair (1, e *). It is convenient 
to use the form 


(26) h(x; 6, y) = (e* — y)/(6 — vy), 56—y> 0, 
where 6 and y are constants. Write G(z ; £) = 1 — F(z; &) where F(z; &) is as 
defined in Section 4. For (26) to be an upper function for P(x; &), the inequality 


(27) 6F(—b — 2; &) + yG(a — x; &) Ss e *F(—b — x; &:) + € Gla — 2; &) 


sl 


is sufficient. Reversal of this inequality will make (26) a lower function for 
P,(x; &). Evidently (27) will be satisfied by the pair of constants 6 = 6, and 
y = ¥: defined by 

e F(—b — a; e Gla — 2;&) 


(28) 5; = min ; — = min —— 
(—b.a F(—b aoe r;fo) (—b,a) Gila = x; £o) 


The reverse inequality to (27) will be satisfied by the pair 6 = 6. andy = y; 
defined by 


= e “F(—b — x38) e “Gla — x; &) 
(29) do = max - - , Y¥2 = max . ‘ 
ba) F(—b — X; &2) (-ba) Gla - X; Ee) 


Thus, the functions h(x; 4; , y,), 7 1, 2, obtained by using (28) and (29) in 
(26) will be respectively upper and lower bounds for the solution P(x; &) of 
(19) when &€ = & 

A similar argument using the integral equation (20) with § = £, and the form 


g(a; 5 


o/ 


y) = (€& -— y')/6 
For an upper function use 


— y') leads to upper and lower functions for P2(x; &) 


er . €Gla — x3) ' . 
(30) 6; = min — -—. ¥1 = min 


b.a nea — 2: Pi) 


. ° . Te, , . : ; ; b 
For a lower function use the pair 62. and y2 obtained by using maxima in (30) 
instead of minima. 


Consider next the integral equation (21) for the first moment M,(z; &) of the 


duration of the sequential experiment. Let p = | (v + r)f(v; —) dv and» = 
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/ (v + 7r)f(v; &) dv where f(v; &) is given by (18). For any choice of a constant 


"ytk,fy—2z 
QA s( QA nt) dy, 


= (2A)~’ Ces = — rst) dy. 


Let A1, A, and k be any constants and define the function 


k one has 


‘ k a+ k\’ 
(32) Ria: g, Ai, de, k) (r; ‘rT Agv) Mi (zx; £) + Ay a + Ae (*3*) ° 


Using the integral equation (21) and the identities (31), it is easily shown that 
the function (32) satisfies the integral equation 
R(x; & 1, A2, &) = glx; & A, Az, &) 
(33) a = 
+ (2ay* [RU & Mr, de, Bf (AS = 58) ay. 
-b 2A 


The leading term is defined by 


g(x; &, Ai, Az, k) 


—b a 
—A™ ro p(x + k) + (2A) (/ + | i ~ r) 


(34) 


k\’ 
¢ + ) |w. 
Let A, and A» be further constants and define the functions 
(35) S,(2; g, Ai : Ae ’ k) (— 1) [R(a; g, A ’ de ; k) —_ A iP (2; £)], 1 = i. 2. 


Using (19), (20) and (33), one finds that the functions (35) satisfy the integral 
equations 

Si(z; & Ay ’ Ae ’ k) - filx; g, At ’ de ’ k) 
(36) 


5 Laie (ta - v8) 
— Sily; & A ,A2,k . -— 73 d) 
+  Dihys & May Ne )f om 7; &) dy 


with leading terms defined in terms of (34) and the constants A; by 
\ g(x; g, Ai ’ de ’ k) _ A,F(—b — 2; £) 

(37) fila; &, a, Av, kK) = § 

\AvG(a — 2; &) — g(x; & M1, As, k) 


If, for some specific choice of \; , A, and k, the constants A; and A: are chosen 
so that the functions (37) are nonnegative on the interval —b S z S a, the solu- 
tions (35) of the integral equations (36) will be nonnegative on that interval. 
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It follows that, for such choices of the A, with distinct values k;, i = 1, 2, of 
the constant k, one has bounds for the quantity (Aw + Aw) M,i(z; £) given by 


Ai P\(2; 8) — rs As (7 a 


2 
) S (au + Agv) Mi (2; £) 


(38) 3 
2th _ (224). 


S A, P2(z; §) — DA DA 


If u is not close to zero, the choices \; = 1 and }, = O are useful and the 
bounds (38) are of the type given by Wald in [10]. If u is close to zero, a better 
choice is \; = 0 and d, = 1; in this case the bounds (38) are of the type given 
by Wald [13], using x = 0. 


7. Examples. Suppose that the distribution (18) is 
f(v; 8) = ev — &) = (2m) exp (fv — Wo” — 42). 


The sequential test under consideration provides a test on one value £; of the 
mean against a larger value & . It is clear that all of the hypotheses that have 
been placed upon (18) are satisfied. 

In this case the functions to be minimized or maximized in (28), (29) and (30) 
are monotone. For example, consider, in (28) and (29), the functions 


x(x) = € “F(—b — 2; &:)/F(—b — 2; &) 
and 
xo(t) = €“G(a — x; &:)/G(a — 2x; &). 


One finds that r + & = —A and r+ & = A. On setting ¢ = (b + x)/24 in 
xi(z) and t' = (a — x)/2A in x(x), these functions take the forms 


xi(z) = e” **°G(t — A)/G(t + A) 
and 
{xo(x)}? = e * 4G — A)/G( + A). 


Wald ([10], pp. 140-141,) has shown that the function of ¢ or of t’ involved here 
is @ monotone decreasing function of its argument. It follows that x,(2) and 
x2(x) are monotone decreasing functions of z. 

Using Gi(u) = (2 ry" | exp (—42") dx and A = (a + b)/2A, the definitions 
(28) and (29) give 
(3 (51 e “G\(A = A)/G\(A + A), i = € “G,(A)/G,(—A), 
39) 

{de = e’G,(—A)/G,(A), ¥2 e’G,(A ea A)/G\(A — A). 

Using (39) in (26) and evaluating at z = 0, one has the bounds 


(40) (1 — y2)/(62 — v2) S Pi(O0; &) S (1 — n)/(: — 11) 





354 G. E. ALBERT 


for the probability of accepting & when £& is correct in Wald’s test. Replacement 
of 5; by e’ and y» by e “ in (40) gives the bounds obtained by Wald. It will be 
shown that 6, > ¢’ and y. < ¢€”, from which it will follow that (40) gives better 
bounds than those quoted. Actually, the improvement is only very slight. 
Further improvements will be indicated in the next section. 

From the asymptotic series G,(u) = g(u)fu? — u* + 3u7° — -->], valid 
for large positive u, one obtains 


é = e((A — A)" — (A — AY* 4+ ---JJ(A + AD” — (A + AY” +), 
2 =e (A +d) — (A + bd)? 4+: (A — AD — (A — AD H+], 
where A has the same meaning as in (39). Obviously 6; > e’ and y. < &*. The 
bounds for P.(x; &) may be treated in a similar way. 

Wald [10] gave an explicit calculation of bounds for the expected duration 
of the experiment for the case in which y is not close to zero. His results are 
derivable from (38). To see this, choose \; = 1, A» = 0, hk; = —a and ky = b. 
For the functions (37) to be nonnegative on —b S x S aa, it is sufficient that 

A; Ss g(x; g, 1, 0, —a)/F(—b —2Z; é), 
Az g(x; g, 1, 0, b)/G(a —2Z; £) 


be satisfied on that range. These inequalities are satisfied by 


: b zr 
A; = min | -4 4 (2 : -t 


ba 2A 


As = max E — x3 & x = 


a 


where A has the same meaning as in (39) and 
x 


x(t) t— | rg(x) dx/G,(t). 
Jt 


The function g(x) is the standardized normal probability density. By Wald 
{10, p. 144], xs(¢) is a monotone increasing function of ¢. It follows from (41) that 


+ 7r— (o(& + 7))/(Gi(E + 7)) 


and 
t+E+ ert (OE + 7))/(PilE + 7)). 


Evaluation of (38) at x = 0 using the various constants chosen above gives re- 
sults that are in agreement with (4.13), (4.14), (4.20) and (4.24) of Wald [10]. 

As remarked in Section 4, the binomial and Poisson probability functions 
have the form (18). With proper care in the interpretation of integrals as sums, 
the results of Section 6 may be adapted for tests of simple hypotheses and alter- 
natives on the parameters of these distributions. Again the results are com- 
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TABLE 1 


Bounds on P,(0, &:) fora = b = 2andA = 0.25 


Lower Upper 


By Wald 0.07941 0.12459 
By (39), (40 0.08008 0.11717 
Optimal 0.08089 0.10997 


parable with those obtained by Wald [10] and Herbach [4]. Since the normal 
example given above furnishes an adequate comparison of this type, explicit 
calculations will be omitted here. 


8. Possible further improvements and applications. The upper and lower 
bounds derived in Section 6 for the risk probabilities and expected sample size 


tend toward the exact solutions of their respective integral equations as A ap- 


proaches zero. For small values of A these bounds may be sufficiently close to the 
exact solutions to be regarded as solutions by the designer of practical experi- 
ments. For values of A greater than .01, say, more accurate solutions are needed. 

For specific problems such as the test considered in detail in Section 7, it 
should be possible to choose the constants involved in the form (26) in such a 
manner as to minimize the upper function and maximize the lower function. 
Explicitly, find the largest values of 6 and y such that 6 — y > 0 and (27) is 
satisfied and the smallest values for which the reverse inequality to (27) is 
satisfied. The procedure is numerical and involves trial and error. In the normai 
example of Section 7, it is found that the values of y; and 6. given in (28) and 
(29) are the best possible values, but that 6; and ye may be improved. In the 
special case of that example defined by a = b = 2 and A = 0.25, the equations 
(39) give 6; = 7.8514, y, = 0.09071, 6. = 11.024 and y. = 0.12736. The optimal 
values are vy. = 0.11785 and 6; = 8.3593. For this case the bounds on P,(0, £) 
given by Wald {10}, by (39) and (40), and by the above optimal choices of con- 
stants are shown in Table 1. This example shows that further improved bounds 
are needed. 

It is interesting to note that the approximate value P\(0, &) = (1 — e*)/ 
(ec? — e*) obtained from equation (3.35) of Wald [10] gives the value 0.11920 
for the numerical example tabulated above. Since this value lies outside the 
range allowed by the optimal bounds, a need is indicated for a better approxi- 
mate formula for P,(0, &). 

Various devices are known for the numerical approximation of solutions of 
integral equations. One or more such devices might be used in combination with 
the upper and lower function concepts with considerable success in a well 
equipped computing laboratory. In particular, careful attention should be given 
to the possible utility of Theorem 6. 

The versatility of the theory presented in Part I should make it a useful tool 
in the definition and study of (1) more general termination rules than that used 
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in Part IJ, (2) multidimensional problems, and (3) decision problems that in- 
volve more than two terminal decisions. The writer hopes to investigate such 
applications at a later date and hopes that this paper will serve to interest others 
in joining the search. 
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SEQUENTIAL PROCEDURES IN CERTAIN COMPONENT OF 
VARIANCE PROBLE}M‘S' 


By N. L. JoHnson 
University of North Carolina and University College London 


1. Introduction and Summary. The primary purpose of this paper is to 
discuss certain sequential procedures for discriminating between two values for 
the ratio of variance components in a simple one-way classification. In order to 
clarify the presentation of the three procedures discussed, they will first be 
applied (in Section 2) to the problem of discrimination between two values for 
the ratio of variances in two distinct normal populations. The arguments in 
Section 3 closely parallel those in Section 2, and the form of the two sections has 
been made as similar as possible, to emphasize this parallelism. Section 4 contains 
some formulae used in calculating approximate average sampling numbers 
presented in the earlier sections. 


2. Comparison of two variances. Let II, , Il, be two normal populations with 
variances 1, 2 respectively. It is desired to discriminate between the hy- 
potheses H’(oi/02 = 0) and H”(ci/o2 = 0”). It has been shown [1] that the 
following sequential procedure (I) will give approximate probabilities a’(a”) of 
choice of H”(H’) when H’(H”) is valid. 

Procepure (1). “Start by taking samples of two (rn , ty ; 2x , Lx) from Tl, , Ty 
respectively. At each subsequent stage (if required) one further individual is 
taken from each population. 

‘At the (n — 1)st stage calculate 


(1) jn = bb (a1; — %)° > (10; — £2)?. 


i=! i=l 
Accept H’ if P(gn | 8”)/plgn| #) < A. 
Accept H” if P(gn | 8”)/plgn |) > B. 
Otherwise proceed to the nth stage.” 
In the above statement A = a”/(1 — a’), B = (1 — a”)/a’; and 


gia) git) 
2) 18 @ epeeeeeieteeenn eee 
@) Pgs | 6) B(n — 1),3(n—1)) (6+ gn)” 


so that 


P(gn | 6”)/p(gn | 6) = (0”/0')""- (0 + gn)/(O” + gn)” 


The validity of procedure (I) can be demonstrated by showing that it is 
equivalent to a sequential likelihood ratio test based on the sequence {g,}. 
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These variables, however, are not independent and do not have a common dis- 
tribution so that the approximate formulae for the ASN (average sample num- 
ber) given by Wald [2] cannot be applied. Two alternative procedures, (II) and 
(III), will now be described. It is possible to use the approximate ASN formulae 
for these procedures. 

Procepure (II). “At each stage take a sample of r individuals from each of 
II, and Il, . Calculate g, by (1) for these 2r values and call it g,.,. if based on the 
sample available at the nth stage. 


Accept H’ if II (p(g,.; | 0”")/plgr.) |0)] < A. 


j=l 


Accept H” if LI (pG,.; | 0”) /p(gr.s | 0’)] > B. 
j=l 
Otherwise proceed to the (n + 1)st stage.”’ 
In the above statement p(g,,; | 4) is given by (2) with g, replaced by g,,, and 
n by r. Hence 


II [p(gr i |0”)/p(gr,s|0")] = (0/0)? TT (@ + gps) /0” + yrs)”. 


jal j=l 

ProcepureE (II1). Here we use the same sampling scheme as in (I), but a 
different decision rule. 

“Start by taking samples of two (ay , tw ; 42 , 42) from Il, , Ty respectively. 
At each subsequent stage (if required) one further individual is taken from 
each population. 

“At the (n — 1)st stage calculate 


(3) 0» = (41 + Ze 4+ 


Accept H’ if | (p(y) 6”) /p(g; oy: < A. 


pe 


n 


Accept H” if II [py | 6”) /pG’ | 6’)| < B. 


p=’ 


Otherwise proceed to the nth stage.” 


In the above statement 
(4) p(g; | ®) = p(gs 


where p(g» | @) is defined as in (2), so that 


, bin—1 n 
II (py) | 6”) /pGi\6)) = ( ) IT (@ +49/)/(@” +9). 
j=2 j=? 


In Procedure (II) the g,,;’s are evidently mutually independent and have a 
common distribution. In Procedure (III) the quantities 


Yes = (Ca + tat ees + egy — (J — Vay)/VIG — 1) 
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are mutually independent normal variables [3] each with expected value zero 
and variance o;(t = 1, 2). Hence the g/’s are mutually independent and have the 
common distribution (4). 

In both (II) and (III) therefore we can use Wald’s approximate formulae 
for the ASN. The results of applying this formulae are given below in (6) to 
(9). In these formulae 

a=a log B + (1 — a) log A; b = a&” log A + (1 — a”) log B 
and 


(vy, , vo; 8) 


(5) hve : byy—1 w\ bee , a(ritre)"] 
" sill teeta log c (¢ = 2) dq. 
bye) Jo (8 + git 6 y+g 


For Procedure (II), the approximate ASN are 


(6) 2ra/E(r — ljr — 1; 6), when H’ is true, 


(7) 2rb/E(r — 1, r — 1; 6”), when H” is true. 
For Procedure (III), the approximate ASN are 

(8) 2 + 2a/E(1, 1; 6), when H’ is true, 

(9) 2 + 2b/E(1, 1; 9”), when H” is true. 


Table I shows the ASN calculated by these formulae and also the sample 
sizes required by the fixed sample tests having the same values of a’ and a”. 
In the cases shown in the table a’ = a” and in the present instance this implies 
that the ASN will be the same when either H/’ or H” is true; this common value 
is shown in the table. 

Comparing (6) with (8), or (7) with (9), we see that the ASN for (II) with 

= 2 will be almost double that for (III). This is to be expected since (IT) with 


TABLE | 


a’ =a” = 0.05 
Procedure 


2 


(11) 
(IT) 200. 


r 
r 

(II) r 158. 
r 
r 


203.8 | 142.: 
115.2 80.6 
91.3 64. 

(II 128. 74.3 | 52.3 

(II)r = 106. 61.7 43 .: 

(II) r 90. 52.3 36.8 

(III 179.5 | 108.$ 73. 

Fixed sample 184 108 76 

Girshick 89.3 51.6 36.: 


Is] & bo 


i a 
— 
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r = 2 neglects all comparisons between successive groups of 2 observations from 
each population. As r increases, the proportion of neglected comparisons di- 
minishes and the approximate ASN also diminishes. When the ASN is not large 
compared with 2r, the number of additional observations obtained at each 
stage, the values have of course only a formal significance. It is however of 
interest to note that as r tends to infinity, the approximate ASN tend respec- 
tively to limiting values of 

(6 + 6”)* 
These values are shown in Table I on the line r = ~. It may be conjectured 
that these values might, in fact, be related to the approximate ASN for Pro- 
cedure (I). It will be noted that they are considerably lower than the figures 
for (IL) with r equal to 2 or 3, or for (III). These latter are in fact only slightly 
lower than the fixed sample sizes, and it is likely that when @ lies between 6’ 
and 6” the ASN for these schemes would substantially exceed the fixed sample 
sizes. 

Girshick [3] has constructed a test discriminating between the hypotheses 


A'(o; = of ,%2 = a2) and A’ (0, = 0s,0. = a1). 


*) 


(6 + @”)” 


(10 —4a/ | 
) a/ og 49’9” 


- 4b/ log 


This test is based on the inequalities 


log A < 4 | (ot _ (oy | ~ (yis — yo) < log B. 
j=2 


The application of this test requires a knowledge of the actual values of o; and 
o2 , not only the ratio 6 = (c}/02)’. However the formula for the approximate 
ASN depends only on this ratio. It is 


2 — 4a/[(@)’ + (6) — 2] if A’ is true, 
and —b replaces a if A” is true. 
The ratio 
6” (o1/02)” when Al” is true 
(7) ~ (oi/o2)* when A’ is true 


is equal to (6°)*. Hence if we take @ = +/6”/0’ we can compare the approximate 
ASN for Girshick’s test. with those for the various procedures described above. 
Such ASN values are shown in the last line of Table I. It is to be expected that 
they will be smaller than corresponding ASN for the other procedures, since 
(i) «{ and o3 must be known to apply Girshick’s test, and (ii) the test is a se- 
quential probability ratio test based on the independent pairs of random vari- 
ables (y1; , Y2;). The closeness of the ASN’s for Girshick’s test and for Procedure 
(II) with r = © is noteworthy. 


3. Comparison of variance components. We will consider a one-way classifica- 
tion by groups and denote the internal (within-group) variance by o and the 
external (between-group) variance by oo . This means that if 2; is the ith ob- 
servation from the ¢th group then 2; = A + u, + 2, where the u’s and 2’s are 





SEQUENTIAL PROCEDURES 361 


mutually independent normal variables each with expected value zero and 
variances 04 , o respectively; A is a constant. It is desired to discriminate be- 
tween the hypotheses H’(o)/o° = 4’) and H"(o3/o° = 6”) with risks of error 
a’ , «” as defined in Section 2. 

Procedures analogous to (I) and (II) above have been discussed in [4]. There 
are two simple alternative ways of constructing sequential procedures in this 
problem: (a) taking a fixed number, k, of groups and, at each stage taking one 
further additional observation from each group; (b) at each stage selecting 
(at random) a further group (or set of r groups) and taking a fixed number, 
m, of observations from each group. It was found in [4] that Procedure (I) used 
in conjunction with (a) will not terminate with probability one unless 6’ (or 6”) 
is equal to zero. Procedures (II) and (III) are not applicable in conjunction with 
(a) as the successive sets of observations are not independent of each other. 
This section will be concerned exclusively with sequential procedures constructed 
according to system (b). 

Procepure (I). “Start by taking two groups and m observations in each 
group. At each subsequent stage (if required) one further group is chosen and 
m observations taken in it. 

“At the (n — 1)st stage calculate 


(11) é=-0te@-e/T 2 & - 2 


t=1 i=l 
Accept H’ if p(G,, | 0”)/p(G, | 6) < A. 
Accept H” if p(G, | 0”)/p(G, | &) > B. 
Otherwise proceed to the next stage.” 
In the above statement 6 = 1 + mé’, 6” = 1 + mé” 


and \ 

gin im—D oo” 
12 Ga | = a 
(12) PGn\9) = Bae = 1), Intm — 1) OF GD 


so that 
p(G, | 0”)/pGa | 0) = (0"/e}""" [(0 + G,)/(0” + GPO". 


The validity of Procedure (I) can be demonstrated by showing that it is 
equivalent to a sequential likelihood ratio test based on the sequence [{G,}. 
As in Section 1, Wald’s approximate formulae cannot be applied in this case. 

ProcepureE (II). “At each stage take a sample of r groups and take m ob- 
servations in each group. Calculate G, by (11) for these mr values and call it 
G,., if based on the observations taken at the nth stage. 


Accept H’ if II (pG..; | 0”) /p(G,.; | 6)] < A. 


j=l 


Accept H” if Il (p(G,,; | 0”) /p(G,,;|0’)| > B. 
j=l 


Otherwise proceed to the (n + 1)st stage.” 
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In the above statement p(G,,; | @) is given by (12) with G, replaced by G,., 
and n by r. Hence 
IT (p(G,.; | 0”) /pG,,;\ @”)) = @”/o")'"" TIL (6 + G,.)/@" + G,.,) 
j=l j=l 

Procepure (III). “Start by taking two groups and m observations in each 
group. At each subsequent stage (if required) one further group is chosen and 
m observations taken in it. 

“At the (n — 1)st stage calculate 

CG, = ma + fo +--+ + Fei — (n - 1)%,)° ‘n(n — 1) z (tani — Z,)’. 


i=} 


Accept H’ if I] (p(G; 6”) /p(G;\ 6’)) <A. 


Accept H” if II [p(G) | 6”) /p(G5 | )| > B. 
j=? 
Otherwise proceed to the nth stage.” 
In the above statement 
m—1 G’, } 


re g° 
(13) p(G; | 8) 


Bi4,4(m—1)) (6+ G*)™ 


Il “p(G; 6”) - 9” 4 (n--1) (m—1 Il (9 + G') bm 
j= LPG, |e} \e jot LO" +G5) | 


so that 


(Note that the equation analogous to (4) does not hold.) 

In Procedure (II) the G,,;’s are evidently mutually independent and have a 
common distribution. In Procedure (III) the quantities 

¥,= Vm(@& +t +t Fu-G- 1)%,;)/~ HI + 1 

are mutually independent normal variables each with expected value zero and 
variance o + mo). Hence the G;’s are mutually independent and have the 
common distribution (13). In both (II) and (II1) therefore we can use Wald’s 
approximate formulae for the ASN. The results of applying these formulae 
are given below. 

For Procedure (II), the approximate ASN are 
(14) mra/E(r — 1, r(m — 1); &), when H’ is true, 
(15) mrb/E(r — 1, r(m — 1); 6”), when H” is true. 

For Procedure (II]), the approximate ASN are 
(16) m + ma/E(1, m — 1; @), when H’ is true, 


(17) m + mb/EH(1,m — 1; 6”), when H” is true. 
Table II shows the ASN calculated by these formulae and also the sample 
sizes required by the fixed sample tests having the same values of a’ and a”. 
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As r increases the ASN of (II) decreases. As r tends to infinity the values given 
by (14) and (15) for the approximate ASN tend respectively to 


2ma 


” 


me’ 


(m — 1) log ° + m log ; 
6 ~ (m — 1/0” + 0 


(18) 
2mb 


NK 


(m — 1) log a’ + m log n= i + 


me” 


These values are shown in Table II on the lines r = «. 


TABLE II 


True 
Hypoth Procedure 


esis 


H’ Il 
(Il 
Il 
(III 


II 
II) 7 
II 
Ill 


Fixed 


Il 
(IL) + 
IIT) 


(II) 7 
(il 


709.1 644 
II : 291. { 7 ( 494.9 461. 
Il) 7 d Jz 423.0 400 
Il x : 7; 204.5) 220.2) 254.7, 278.9) 277 


aul ( 3) 20% 555.3 447.1 


II = 3 675.4 560 
II)r=; 3 244.: : 476.1 407! 
II) r = 208 . t 410.2 353, 
(dl) + : : 4) 278.9, 270.3 
(IIT) f ‘ ‘ 555.3 424.1 
Any Fixed 32 0 3% 664 624 
H II)r =2 ‘ f - 263.4 251 
(ll = © . : 103.2 109 
Il d 204.1 172 
Il) ‘ 4 243.1 
Il 0. , 54.3 | : 7 103.2 2.3 106.0 


204.1 53.9 - 


246 
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It may be conjectured that these values might, in fact, be related to the ap- 
proximate ASN for procedure (I). 

For any given value of m, comparisons between Procedures (II) and (III) 
and the fixed sample procedure are mostly similar to the comparisons in Section 
2, where variances in two different populations were being compared. However 
there is, in the present case, the possibility of choosing a most suitable value 
for m. Except for Procedure (II) when H’ specifies 6’ = 0 and the counter- 
hypothesis H” is true, the figures shown in Table II indicate the existence of a 
minimum ASN corresponding to a fairly small value of m. In the exceptional 
cases just described, the approximate ASN formulae give results which decrease 
as m increases. Since the minimum possible sample number r is in fact mr there 
will be, however, a value of m for which the ASN is actually minimized in these 
cases also, though it may be expected to be rather higher than in the other 
cases of Table II. 

Usually the cost of choosing a further group and the cost of taking an ob- 
servation within a group will both enter the calculations so that the minimiza- 
tion of the ASN may not be a primary objective. If, for example, the second of 
these two costs is much smaller than the first, then values of m rather larger 
than those minimizing the ASN would be preferable. 


4. Formulae used in calculation of approximate ASN. The evaluation of for- 
mulae (6) to (9) and (14) to (17) (leading to Tables I and II respectively) in- 
volved the calculation of the quantities E(», , v2; 0). The formulae used in these 
calculations are recorded in this Section. From (5) 


n+» 06" 
2 B(4y, ; 4 v2) 


; ms 9” 
E(w, v2; 0) = 5 log a + 


(19) 


-| - tl log eas 
b + greta) 8 OT g 
<xpanding the logarithm in the integrand in various ways, the following 


expressions for E(v, , v2; 4) are obtained: 


v2 @” "4 = v2 l vo(ve + 2) «++ (ve + 25 ~— 
8 Noe © Ss Ss oa 
2 "8 . 2 y (vi + v2) +++ (nn foe + g=3 


Wn j Ni ” 
{( -5) -(1 -*)] for 0 x s 
6 6 6 


"1 wl c+ 4" +2 ve >! ni(y + 2) --+ (4 + 27 — 2) 


(20) 


2 Sj (Mtn): Se Tw 


@\ 0 \’ ° 
{( sal >) " (: » ey for0 < 6/@ <2; 0 < 6/@’ 


With @ = 6, (20) gives an expression for E(y,, ; 6) valid for 
0 < 6/0 s 2; (21) gives an expression valid for 0 < &/0” S 2. Thus one or 


(21) 
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the other will always be valid, sometimes both. When both are valid, one of the 
two is usually much more rapidly convergent than the other. In some cases it 
is advantageous to expand log (6”/6’) in powers of (1 — 6”/6’) or (1 — 6/6”) 
and to combine the two infinite series into one. For example, one formula ob- 
tained from (20) in this way is 


E (vy , v2 ; 6) 


(22) = — dys > 1 [ _ _W2 + 2) 2 + 4) +++ 2 + 25 ade >| (1 a 5) 
"jm J (vy + ve + 2) -°* (vo + ve + 27 — 2) 0 


(for 0 < 0”/@’ S 2). 


5. Comments. The following remarks are intended to elucidate certain points 
of detail which have been omitted in Sections 2 and 3, where the development 
was more or less formal. 

In the successive stages of each of the procedures described it would usually 
be more convenient to take logarithms in the inequalities which are used. Thus, 
for example, in Procedure (IIT) of Section 2 the function 


ps flog (@’ + 93) — log 6” + gi] 


j=? 


can be compared with the critical limits 
log A — 4(n — 1) log (6’/0@), log B — 4(n — 1) log (0/6). 


In Section 3, Procedure (III) could be modified by using > int > et (yi — 
é,)* in the denominator of G2 in place of S~7-; (x2: — #)* and altering p(G@: | 0) 
accordingly. The approximate ASN formulae (16) and (17) would not apply, 
of course, but the ASN of the modified test presumably would be less than that 
of the original test. 

It is likely that Procedure (I) will be preferable to either (11) or (III) even if 
the ASN are not as small as the values given by the conjectural formulae (10) 
and (18). Procedures (II) and (III) are not here proposed as serious alterna- 
tives to (I) but rather to provide a background to help in assessing (I). 

In the case 5’ = 0 of Section 3 there is the further alternative of using a sam- 
pling scheme based on a fixed number of groups and an increasing number of 
observations in each group, following a sequential procedure analogous to (I). 
While often this will be preferable, for practical reasons, it appears intuitively 
that the ASN of such a procedure will exceed that of procedure (1) of Section 
3 based on increasing the number of groups. 

For higher order hierarchal classifications (e.g. observations x; of structure 
Lig = A + 0; + Wi; + 2x) the problems are similar to those studied in this 
paper. There will, of course, be more variables in the choice of procedure. Just 
as the number of observations per group was an added variable in Section 3 
as compared with Section 2, so there will now be the added variable of the 
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number of second-order (w) groups per first order (v) groups, and similarly for 
higher order classifications. 
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ON QUADRATIC ESTIMATES OF VARIANCE COMPONENTS’ 


By FRANKLIN A. GRAYBILL 


Oklahoma Agricultural and Mechanical College and Iowa State College: 


1. Summary. In this paper quadratic estimates of variance components are 
considered. For the general balanced nested classification with no specific distribu- 
tions assumed, it is shown that the quadratic estimate which is unbiased and 
which has minimum variance is given by the analysis of variance method of 
estimating the variance components. 


2. Introduction. Consider the general balanced nested classification which is 
described by the model 


VY ijeeep = w+ i + Oi; + Cin + dijtm + 


ijkessp « 


If all the components are fixed except ¢;,,...p , and if e;,...p are uncorrelated 
random variables with means zero, variance o and with finite fourth moments, 
then Hsu [1] has shown that the best (minimum variance) quadratic unbiased 
estimate of o° is given by the analysis of variance method of estimating o’. 

If we assume a distribution for the e;;...,, then we can get the maximum 
likelihood estimate for o°. However, this method does not tell us how the vari 
ance of this estimate compares with the variance of estimates obtained by other 
methods for finite-sized samples (except for efficient estimates). 

If a;, bij, Cis, +++ , Cije---p ave uncorrelated random variables with means 


. 2 2 2 2 ° ‘ é ° 
zero, variances oa, 05, o-,°**, and o respectively, then Crump [2] gives 


methods for estimating o,, 0,, o-,°°:, and o by making an analysis of 
variance table, equating expected to observed mean squares and using the solu- 


tions to these equations as the estimates. These estimates are quadratic functions 
of the observations and are unbiased, but very little has been said about the 
size of the variance of these estimates relative to estimates given by other 
methods of estimation. Proofs will be given for the two-fold classification only, 
but they generalize without great difficulty. Theorems will be stated for the 
general case 
3. Notations and definitions. Consider the linear model 

(3.1) Yin=utatbhjten, t=1-->Mijsj=l--:Nesk=1--- Ny, 


where };,, is the observation; u is a constant; and the a; , b;; , and e;;, are inde 


pendent chance variables with means zero, variances o. : ot, anda’, respectively, 
third moments B: , B} , and B’, respectively, and finite fourth moments a, , a} , 
and a‘, respectively. The problem is to estimate o} by quadratic functions of the 
observation Yj; . 
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DrFINITION 1. By the best quadratic unbiased estimate (BQUE) of o, we 
will mean a quadratic form Q which satisfies the following: 
(3.2) (a) EQ = 0, (b) var Q S var Q*, 
where Q* is any other quadratic form in the observations satisfying (a). (2 de- 
notes mathematical expectation, var denotes variance.) 

DEFINITION 2. Let a general quadratic form Q be denoted by 


Q - > Y; k Vet Qisk- 


tjk;rst 


The quadratic form given by the analysis of variance method of estimating 
S. ” , at t 
o, is M = 7 iit Yreem;5, , Where the elements m?*; are as follows: 
f 


-1} 
(s ——~ 
v N, N2 Ni(N2 — 1) 


i; 8 + p 


rat | 


4 Nik = (b) N.N.NAN, — 1) j;tFk, 
1 2 3\4v3 ™ ) 


{ (c) O all other values of ij/erst. 


The following relations can be obtained from (3.3): 


(a) >> miji = 0 for any r, s, ¢. (b) >> mi5t = 0 for any i, j, k. 


tjk ret 


(c) > mit = Ofor any r, s,t,7. (d) >> mii = 0 for any i, j, k, r. 
jk st 


(e) > mij. = 1. 


ij;tk 


(3.4) 


‘THeoreM I, Jn the model given by (3.1) the BQUE estimate of a4 is given by 
the method of analysis of variance, that is, by the quadratic form M stated in (3.3). 
Proor. Let 


‘ , t t st t 
(3.5) D= > Yink retQije Where gif = Mj + dij, 


which implies Q = M + D. The problem is to give exact specification to each 
t ° s . t 
qijx 80 that Q satisfies (3.2) or, since mjj, are known, the problem reduces to 
giving exact specification to each dj}, . To satisfy (3.2a) we must have E(Q) = 
’ , 2 a) 
E(M) + E(D) = o, . This reduces to 
2 t at 2 ret ret 
2 a (dijx +- mi jh) + Oa Z (mi jx > Aj jx) 
3 R R(r=i) 
(3.6) 
. ' 2 t re 2 
diitmpt+o DY die + mii) = ob, 
R(ri,j=s) R (rt; j—s:k=mt) 
where the summation index R refers to summing over all values of the indexes 
with the restriction in parenthesis. For example, under the second summation 
sign the symbol R(r = 7) means sum overi = 1--- Ny;r=1-+-Ni;g=1-:: 
No; 8 = 1+-+No;k = 1-+-Ng;t = 1--- Ng; with the restriction r = 7. 
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Equating coefficients in (3.6) and using (3.3), we get the following conditions 


on dj5, : 
(az) Dodi = 0 bo d+ dii=0 
R 


R (r=i) 


(c) yD di =0 pm dij, = 0. 


R(r=t,j=s (d) R (ri; j=—s;het) 
var Q=E (>> Vin You(mih: + dij]? — of 
R 


EQS Yow Yous mist’ + BLS Youn You dist’ 
R 77 


1 « ’ ret u 4 
> 2E [> Y six | 6 jk - Y son Tes dyn — Ob 
R P 


where the index P is defined for the subscripts f, g, h, u, v, w, the same as R 
was defined for 7, j, k, r, s, t. We will examine the last term of (3.8) in detail. 
Let us first examine the quantity 


(3.9) - Vise Yves Mose - 
R 


Substituting (3.1) into (3.9) we get 


De (ua + ag + di: + esin) (ue + ay +H Dre + eres) Mi5E, 
R 


which in virtue of (3.4c) and (3.4d) reduces to 


> (bij + Cie) (Dre 4 Cres) MEF - 


R 


Thus the third term of (3.8), omitting the coefficient 2, reduces to 


(3.10) EB [ X (bi; + Cije) (Drs + ret) (us + ay ao by of Cygh) 


(u + dy + due + Cun) Mj jon |. 
Due to (3.7a), this term, which we may denote by C, will contain no terms involv- 
ing uw. Due to (3.7b), C will contain no terms involving a3 , since this would force 
u = fin dp’ and > pjuas) dyn = 0 by (3.7b). Since each of the a; , by; , and e;,) 
are independent with means zero, the expectation of any term in C involving 
these elements linearly will be zero. Neither can C contain a term which involves 
u linearly, since this would be one of the forms uB’* or uB}. But terms such as 
uB* could come only from elements of ¢;;, and é,.; when i = r;j = s;andk = t. 
When these corresponding subscripts are equal the term vanishes by (3.3c). 
Also terms such as 4B} could come only from elements such as ub;;b,.b;, with 
ti=r=fandj = s = g, or ubibab. withi = r = uandj = s = v. Thus we 
would get (from yb;;b,.b;,) 


3 ijt uvw 3 vw ijt 3 usw l 
uBs >, DD miidin’ = uBs Dd die’ Dmii=uBs Dd dif 
kt 


tjkt urwh uvwths uvw;i jh N,N; 


l 3 uvu 
C6 err B, di; . 
NiN2” , 
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This is zero by (3.7a). Now let us examine terms of C which lead to terms in- 
volving o,0.. This can come from only six terms: 


’ >. : ; st “v S » tjt uvu 
( | } k bij Dre Orgn Cur Mi ji dy gh = 30 Mi jk Gu we 
kh P 


tjkti uve 


This is zero by (3.7d). 


‘ ’ rat vu 4 uvw 
(2) kK - > Deg Duy Cs k Creat Mi jh A pg = a mi Week. 
Rk P 


ijk. uvwh 
This is zero by (3.3c). 


(3) ke XL XL bib Juv Crat Csgh mi; ie Aygh. - one te Mi jk dest 


tykwjreat 
This is zero by (3.3) unless r = 7. If r = 7, we get 


ijw = iju jiiv 
7 mii drat = 2; mi jh di i+ 0 ijt 


tjikwiret ijkwt;egt) ijkwt 
piu ist ijw ijt 
7 lise : Mi jk + > ijt Li jk - 
ijwi;sst9 ijwitk 


Using (3.3) this becomes 


—e ' 1 
dint dit | ———— }. 
soe (¥. N2Ni(N2 —;) +, du (woe) 


This is zero by (3.7b) and (3.7c). The remaining three terms are: 


(4) EDEL brabye tiie Cuvw Mis Ujn = obo” Dy mise drih 
k P , 


stijkh 


- ’ rat uvu 2 2 at rat 
(5) k 2. 7: bis bsg Cree Cuvw Mijk Argh = One = Mijk ijn 5 
KR P 


jhhiret 


. ’ ‘ rat at 2 2 rat raw 
{) iD Bie Pus Cijk Osan M, jk poh = 0,0 Mi jk d; th 
R P 


ijkwirst 


Each of these three, (4), (5), and (6), is zero by the same argument as used for 
part (3). Thus we have reduced C to two terms, (A) and (B): 


(A)=E X X bij Dre Deg Duy Mije Aygn 
(B) = EB YX D> eiin ret €sen€uew Mie An 
R ¥ 


Let us examine these in detail. We see that (A) is zero unless the subscripts 
fit one of the a four models: 
FT u;g = v where either i # f or] ¥ g. 
gir u;s = v where either i # ror) # s. 


A.l 
A, 
A. 
A. 


») 
3% -j = v;r = f;8s = g where either i # ror) # 
i Uj=zes=g=v. 

For case A.1, (A) reduces to zero by (3.7¢). In ease A.2, if 7 # r, then (A) 
0 by (3.3). Ift = r, we get 


—| 
(A) = a! ix = 0 x diss ), 
is sinetoes ™ in i ehttios lee =i) ‘) 


i 
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which is zero by (3.7b) and (3.7c). For case A.3, (A) is zero by an argument 
identical with that for case A.2. For case A.4, we get 


(A) = as DO mitt dik 
ijk thu 
which is zero by (3.7¢). Thus (A) is equal to zero. 
Now for (B). We immediately see that (B) is zero unless the subscripts fit one 
of the following four cases. 
Bli=r;: 8;k sf =u;g =v;h = w wheret # f;j # g; ork # h. 


i241 = 


+; t = w where either i # r;j ¥ 8; 
ork # t. 
B31 


ke x# ft. 


: g; t = h where either i # r;7 # 8; or 


BAt=r=zf=4u;) f rk=t=h=w. 


For case B.1 we get 


(B) mii; dg, with either i # f;j7 A gork #h; 


isk; foh 
but this is zero by (3.7d). For case B.2 we get 


(B) = o* z mise Gaia with either i # r;j7 # s;ork # ( 


ijkiret 


If i # r, this is zero by (3.3c). If i = r, we get 


vet 


(B) = o* - mje disk with either 7 # sork # t. 


ijk;et 


If 7 # s, we get, using (3.3a), 


—1 
(B) = o° - elatitieniieahin gs 
, " ees j f N:Ni(N? — 5) ijk 


which is zero by (3.7b) and (3.7¢). If 7 = s and k # t, we get 


| 
1) =o! —— as 
(B) . Pm (anaes _ >) ” 


which is zero by (3.7c) and (3.7d). Thus (B) is equal to zero for case B.2. For 
case B.3 (B) is zero by an argument identical with the argument for B.2. For 
case B.4 we get 


(B) = a*® ee mit di; ; 
ijk 
But this is zero by (3.3c). Thus (B) is equal to zero. 
We have thus proved that C is zero, and (3.8) becomes 


varQ=E (> Vise Yruminl + E (> Vise Yros Cijel , 


. . ° ‘ “sf reat rr : . 
but this is a minimum if d;;, = 0. These values are also consistent with the condi- 
tions of unbiasedness. This proves the theorem. By methods similar to above, it 
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can be shown that the BQUE estimate of o% (or of o”) is given by the analysis 
of variance method of estimating variance components. 


4. The k-fold classification. Let the random variable Y;,,;,...;, be given by 
(4) V igigenty = w+ as) + fh, + aft, +++ + aifh... 


‘ . (t ° 
where i; = 1 --+ nj; and wis a constant and the a‘” are aeiattieal random var- 
iables with the following properties: 


(t) (t) 2 ’ (t) 4 s 
E(qj,i,---i,) = 0 Var Ainig---i, = Ot E(aj,is---4,.) = wae < © 


This is the general balanced nested (hierarchal) classification. 

TueoreM II. The best quadratic unbiased estimate of a; (denoted by &%) in the 
model defined by (4) is given by the analysis of variance. 

Turorem III. The variance of &7 is given by 


2(ne Ness — 1) E k—1 k i an k : | 
WN = Deans = 7 eet 1 ~ a > Now Now Tp oq + > N5+19p 


pmt+l q=p+1 p=t+l 


* 1 2Nt ‘ 
— > Newioe + y~ (nu — 801) + ot, 
Nin — L) qmt+1 i(M¢ - 1) 
where Ny = NuMugis':mifus kandN, =1lifud>k. 
THeoreM IV. Under the model defined in (A), the best quadratic unbiased 
. k 2 / _ S . k a2 
estimate of b B gia; (where g; are known constants) is given by >>, A GiG; - 
This is proven by a method similar to the method used to prove Theorem I. 
mn rom . . “& a2 
(HEoreM V. The variance of | >~'-1 9.6%] is 
Satrardt— AS es Fae] 
2 a2 ‘Yi+l r 2 
gi Var oi — - N re) | ° 
i=1 ; N, i=l N inal o( Nia _ 1) $4 
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SOME THEOREMS RELEVANT TO LIFE TESTING FROM AN 
EXPONENTIAL DISTRIBUTION’ 


By B. Epstein anp M. Soset? 
Wayne University 


1, Introduction and Summary. A life test on N items is considered in which the 
common underlying distribution of the length of life of a single item is given by 
the density 


rc 


1 gear 


’ forz 2A 
(1) p(x; 6, A) = <9 


\0, otherwise 


where @ > 0 is unknown but is the same for all items and A 2 0. Several lemmas 
are given concerning the first r out of n observations when the underlying p.d.f. 
is given by (1). These results are then used to estimate @ when the N items are 
divided into k sets S; (each containing n; > 0 items, }-5.1n; = N) and each 
set S; is observed only until the first r; failures occur (0 < r; S n,;). The con- 
stants r; and n; are fixed and preassigned. Three different cases are considered : 

1. The n; items in each set S; have a common known A; (j = 1, 2, --- , &). 

2. All N items have a common unknown A. 

3. The n; items in each set S; have a common unknown A; (j = 1, 2, --- , k). 
The results for these three cases are such that the results for any intermediate 
situation (i.e. some A; values known, the others unknown) can be written down 
at will. The particular case k = 1 and A = 0 is treated in [2]. 

The constant A in (1) can be interpreted in two different ways: 

(i) A is the minimum life, that is life is measured from the beginning of time, 
which is taken as zero. 

(ii) A is the “time of birth’’, that is life is measured from time A. Under in- 
terpretation (ii) the parameter 6, which we are trying to estimate, represents 
the expected length of life. 


’ 


2. Statement of results. Three lemmas are given concerning the smallest r- 
ordered observations out of n independent observations on the common dis- 
tribution (1). Although they are called lemmas because of their relation to the 
problem at hand, they are of interest in themselves. 

A uniformly minimum variance unbiased estimate 67 of @ together with its 
distribution is given for each case i = 1, 2, 3. This estimate is the unique un- 
biased estimate based on a sufficient statistic. In each case i (i = 1, 2, 3) it is 
given by 67 = C,6,, where C; is a constant and 6; is the maximum likelihood 
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. ‘ : 4 - k 

(m.l.) estimate, given in (13), (14), and (17) below. If R = }°5_1 7; is the total 
number of failures observed, then it is shown that 2R6,/@ is distributed as x* 
with 2h, 2(R — 1), and 2(R — k) degrees of freedom in cases 1, 2, and 3, re- 


,2 
spectively. 


In each case the estimate 67 (or 6;) depends on the k-tuples (ri, r2, --- , Te) 


and (n,, m2, --+ , m) and in case | on the known A values. But it is shown that 
the distribution of the estimate depends only on R, @ (and in case 3 also on k) 
and is otherwise independent of the k-tuple (7, rz, ,r.). The distribution 
is independent also of the k-tuple (m, m2, --- , me), of N, and in case 1 of the 
known A values. Clearly this means that there are many ways of dividing the 
N items into k sets and of taking a total of R observations, all of which give 
equivalent estimates of @. This equivalence is not with respect to the time re- 
quired to obtain the estimate, but with respect to any properties depending on 
the distribution of the estimate. 


3. Derivation of results in Section 2. Let X¥, S X. S --- S X, denote the 
r smallest ordered observations from a set of n independent observations on the 
common distribution (1). In life testing, XY; , the 7th smallest failure, is also the 
ith observation taken so that a sample like the above is obtained by merely 
stopping the experiment immediately after the rth observation. The set of » 
random variables under discussion represents a typical set S; described above 
with the subscript j dropped. The joint p.d.f. of X,, X.,--- , X, is 


(21, Xe, 
P\“1 L 
otherwise. 


Unless explicitly stated otherwise, any set X, , XY. 


X, of the first r of n 
observations considered below will have density (2). 

We now state a series of preliminary lemmas and corollaries. 

Most proofs are direct and hence omitted. 

LEMMA 1. Forl1ss<rs n, the conditional jou t de nsily of 


y ; X 1 x ‘. 1 8,8 -+ ] pre 


’ 


given X, x, (as well as the unconditional joint density) is (2) with (n, r, A) re 
placed hy (n 8s, r — 8, Q) re spectwely. 
Lemma 2. For 1 S r S nand for any preassigned constant c 2 A the conditional 


jou densily of the sel 
(4) Xi Y 


gwen thal Ny > ¢, is (2) with A replaced by zero. 
Coro.uary |. If ¢ is re placed by a random variable C, inde pe ndent of the X; . 


whose range is the interval [A, ©], then the conditional joint density of X; given 
that X, ( is the same as in Lemma 2. 
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LemMa 3. For 1 S r S n the set of random variables 
5 i= (n—t+ 1)(X; — X-)) a 


(9) 
(where Xo is defined as the constant A) are mutually independent with common 
p.d.f. (1) except that A = 0. 


Proor. Utilizing the fact that for r = Bias @ 


= 2, 


i=l 


(6) > (X; - A) + —n(X,-A) =D WwW, 


i=] 


the result is immediate if the transformation (5) is carried out in (2). 


Coro.tuary 2. For 1 Sr S nif 


v= ~ (X; — A) + (n — r)(X, — A) 


i=1 


(7) 
then 2V/@ is distributed as x’ (2r). 

Proor. By Lemma 3 and (6), V is a sum of r independent, identically dis- 
tributed exponential variables W; . Since 2W,/@ is a x’(2) for each 7, the corol- 


lary follows. 
Coro.uary 3. For 1 <r Sn, if 
(8) Vi = D(X, — XD) + w — r)(X, — XD, 
t=—1 
then the conditional distribution of 2V'/@ gwen X, = x, (as well as the unconditional 
distribution) is x (2r — 2). The random variables V’ and X, are independent. 
Proor. The “unconditional” result follows from the fact that 


(9) Vi = > Wy. 
t—?2 
By Lemma 3 each of W,, W;, --- , W, is independent of W, and hence of Y, 
and the corollary follows. 
Coro.uary 4. For 1 S r S n and any preassigned constant c 


> A, if 


(10) (X; —c) + (n — r)(X, — 0), 
then the conditional distribution of 2V*/0 given X, 2 c (as well as the unconditional 
distribution) is x°(2r). 

Proor. By Lemma 2 the conditional joint density of NX? = X; — ¢ given 
X, 2 c is the same as the joint density of X; — A (1 1,2, ---,7r). Hence 
the conditional distribution of V* must be the same as the distribution of J 


namely x (2r). Since the result is independent of ¢ it is also the unconditional 


distribution. 
Coro.uary 5. If ¢ is replaced by a random variable C', independent of the X; , 


whose range is the interval [A, ©] then again the conditional distribution of 2V*/@ 


given X, = C is x (2r). The random variables V* and C are independent. 
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TueoreM 1. The distribution of 6, the m.l. estimate, depends only on R, 6 (and 
in case 3 also on k). The random variable 2R6/@ is distributed as x'(2R), 
x (2R — 2) and x’(2R — 2k) in cases 1, 2, and 3 respectively. 

Proor. In case 1 the joint p.d.f. of the R observed z’s is 


ager" fA-sXas°-°'S 


(11) 
0 


where B is independent of @ and 


(12) V; = > (Xj¢ -- Aj) + (nj — 1) (Xp; — 


i=1 


The m.1. estimate 6, of @ is easily shown to be 


‘ 
(13) 6, = Do V;4/R. 
j=l 
From Corollary 2 and the independence of the V;, it follows that 2R6,/@ = 
> j-1 2V;/0 is distributed as x’(2R). 
In case 2 it can be readily verified that 
k 
(14) 6. = >. V*/R 


j=l 


where 


(15) Hh ePon+ b+ w&— itn ~-D 

w= 1 
and A is the smallest of the R observed X’s. Let Sj denote the set containing 
A. By Corollary 3 ‘the distribution of 2Vjo/@ is x°(2rj — 2) where x*(0) is to 
be interpreted as the sure constant zero. For any other set S; (7 ¥ jo) it follows 
from Corollary 5 that the distribution of 2V}/@ is x’(2r;) and is independent of 
A. All the random variables V* are independent and hence 


” 
(16) 2R6./6 = >. 2V%/e 
j=l 
is distributed as x’(2R — 2). Since V}p is also independent of A by Corollary 
3 it follows that 6, and A are independent. 
In case 3 one easily computes 


(17) 6, = >, V;/R 


j=l 


where 


oe ” Vint ik. - S04 & =H - Se 


i=l 
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By Corollary 3 the distribution of 2V}/0 is x’(2r; — 2) for each j (where x’(0) 
is to be interpreted as the sure constant zero). Hence 


k 
(19) 2Rb,/0 = >) 2V;/0 
j=l 
is distributed as x'(2R — 2k). In this case one needs R > k observations to 
obtain an estimate of @ or, since r; 2 1 for all j, one needs r; > 1 for at least 
one j7. This completes the proof of Theorem 1. 
Define 


k r k 
20) T, = >| , Xie a (n; oh r)Xin| = Ré, + 2 nj A; ° 


j=l 


T: = (T~,Tx) where To = T; and Tn = min Xj. 
I 


T; = (Tw, Tn,+++, Tm) where To = 7; and 73; = Xj 
for j = 1,2,°--,k. 
The unbiased estimates 6f for cases 1, 2, and 3 respectively are given by 
(23) 6; = 6, 0: = Rb./(R —1) and 63 = R6,/(R —k). 


It can be quickly verified that 67 depends on the observations only through 
T; (¢ = 1, 2,3). Hence, to show that 6 are uniformly minimum variance un- 
biased estimates it suffices [3] to show that 7; is complete and sufficient for 
estimating @ in each case i (i = 1, 2, 3). The proof for case 3 is similar to that 
for case 2 and is omitted. To prove completeness we will need the following 
uniqueness theorem for one-sided Laplace transforms (see [1] and [3]): “If 


(24) | fie" dt = 0 for all@ > 0 


J 


then f(t) = 0 for almost all t > 0.” 

THEOREM 2. 7’, is sufficient and complete for estimating @. 

Proor. The sufficiency follows from the fact that the joint density in case 1 
can be written as 


k k 

(25) Ce-* exp | -(7, - nA )/>| TL] fXn, Xin,-°-,X 133 Ay) 
S \ j=l J j=l 

where C is constant, and for each j (j = 1, 2, --- , k) 

( X sn; < © 


(26) {(Xpn,-°° 
otherwise. 


If we let A* = pe , njA; then (since Xj; 2 A; for each i and j) T; 2 A*. 
Let pe(t) denote the density of 7; . To prove completeness it has to be shown 
that if 
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| f(t)pe(t) dt = 0 for all 6 > @ 
es” 


then f(t) = 0 for almost all t > A*. Letting 4; = t — A* and 

f*(h) = Yl + A*) 
and using the result of Theorem 1 that 2¢,/@ is distributed as x*(2R), then (27) 
takes the form 


(28) | f*(t)e"'* dt, = 0 for all 6 > @. 
[t follows from the uniqueness theorem for one-sided Laplace transforms that 
f*(t) = 0 for almost all ¢; > 0. Hence f(t) = f(t, + A*) = 0 for almost all 

> A*. This proves that 7; is complete. 

Coro.tuary 6. 67 = 6, = (T; — A*)/R is the unique uniformity minimum 
variance unbiased estimate of 6. 

Proor. This is a direct consequence of Theorem 2 and the theorem on page 
321 of {3}. 

THEeoreM 3. 7, = (Tx, T') is sufficient and complete for estimating the pair 
(0, A). 

Proor. The sufficiency follows from the fact that the joint density in case 2 
can be written as 


k 
(29) C6 ‘ € — "f(T, A) IL s(Xa ’ X ja, ae X je; 5 Tx). 
j=l 


Here C is constant, 


, a ee if Tn 2 A 
‘ ( é = . 
(30) f(Tn , A) \0 otherwise, 
and the f; are defined in (26). 

To show that 7, is complete it has to be shown that if 


(31) / | S (too, tor) pe, «(too , to) dt Ate, = 0 for all @>0 and all A Pa 0, 
A N to 


then f(t , ta) = 0 almost everywhere in the region tx; > 0, ta > Nty. Let 
u = ty» — Nt» and t = t . By Theorem | we have that 2u/6 is distributed as 
x (2R — 2) and isindependent of t. Moreover, since t = minj,; X ;;, 2N(t — A)/@ 
is distributed as x*(2) by Lemma 3. Then (31), after some cancellation, takes the 
form 


0 a 


(32) | J oe “*¥2*% ®t + Nt, t)dudt = 0, forall@ > Oand all A 2 0. 
A “0 

It thus follows directly from a two-dimensional uniqueness theorem for Laplace 
transforms that 


(33) f (lo , tn) = f(u + Nt, t) = 0, forall@ > OandA 2 O, 
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almost everywhere in the region tx; > 0, te > Nt». Thus completeness of 7 is 
established. 

CoroLitary 7. 62 = Ré./(R — 1) = (Tx — NT)/(R — 1) is the unique 
uniformly minimum variance unbiased estimate of 8. 

Proor. Unbiasedness of 63 is easy to verify. The assertion is a consequence of 
Theorem 3 and the theorem in [3] cited in Corollary 6. 


4. Confidence intervals on 6 and A in case 2. Since 2(7'% — N7'.)/@ is dis- 
tributed as x (2R — 2), it is clear that confidence intervals on @ which do not 
involve A can be found. The following result concerning A is a corollary of 
Theorem 3. 


Coro.iary 8. A unique uniformly minimum variance unbiased estimate of A in 
Case 2 based on (T'» , Tm) ts given by 


Tx — NT x 
N(R — 1) © 

Proor. It is readily verified that A* has expectation A. Hence from the com- 
pleteness of the sufficient pair (7'%» , 7’) it follows as before that A* is the unique 
uniformly minimum variance unbiased estimate of A. The minimum variance is 
o,. = RO/N*(R — 1). 

To get confidence limits which do not involve @, let us introduce the random 
variable U’, where 


(34) A* = Tx — 


(35) l = N(T 2 = A \/(T x» —/? NT). 


Since the numerator and denominator are independent by Theorem 1, it is 
readily shown that the p.d.f. of U is given by 


(36) f(u) = (R — 1)/0 + wu)", 0<u< om, 
Since f(u) is independent of 6, for confidence coefficient a we solve the equation 


isonet 
1 + cr" 


(37) l-—-a= [ flu)du=1—- 
0 


or 

(38) 

Thus confidence limits on A are 

(39) ta — c(t» — NTn)/N <A < ty. 


These limits do not involve @ and are shortest in length for a given confidence 
in the class of confidence intervals based on U. The latter property is established 
by first noting from (35) that all possible confidence intervals on A are obtained 
by equating the probability in some interval of U values, say (c¢ , 2), tol — a@. 
The confidence interval then takes the form 


(40) toy — Co( log ~~ Nts) IN < A < ty — C1 (too _ Ntn)/N. 
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To minimize its length it suffices to minimize c, — c; (i.e. to find the shortest 
interval of U values containing probability 1— a). Since the density (36) is 
strictly decreasing it is evident that the minimum is obtained by taking c, = 0. 


5. Related results. We now indicate some connections between the results in 
Lemmas 1, 2, and 3 and some recent work [4] on ordered observations on a uni- 
formly distributed random variable. It is easy to show that if Y is uniformly 
distributed on the interval [0, B] then 


(41) X = A — 6 log (Y/B) 


has the exponential distribution (1). It follows from the monotonicity of the 
log that an initial ordered set of r out of n exponential random variables corre- 
sponds to a terminal ordered set of r out of n uniform random variables. S. 
Malmquist [4] has pointed out that by virtue of the transformation (41), in- 
dependence in Lemma 3 implies and is implied by a corresponding result for 
rectangularly distributed variables. By using the transformation (41) one could 
prove (this is not done in [4]) analogues of Lemmas 1 and 2 for the rectangular 
case. Specifically let Y, be the vth largest among n independent observations on 
the uniformly distributed random variable Y, then 
(i) the random variables 


(42) Ze = Y541/Y, y=8setl,---,r—1;1 s<ran 


are jointly distributed like the r — s largest (ordered) observations out of a set 
of n — s independent uniform random variables on the unit interval [0, 1]. 
(ii) for any preassigned constant c S B the conditional random variables 


(43) Z* = Y,/c =12---,\rlsreasn 


5 <=) 


given Y; Sc are jointly distributed like the r largest (ordered) observations 
out of a set of n independent uniform random variables on the unit interval 
[0, 1]. 

Alternatively, if these results are shown independently they furnish another 
proof of the lemmas. 


6. Conclusion and an application. In this paper we have given a number of 
results which are useful in making estimates of @ based on life test information 
from one or more sets of data, where the underlying probability law is the two- 
parameter exponential distribution (1). If (1) is the underlying p.d.f., then 


| xz—-A 
d k = seanednial 
(44) 6 1 — P(x; 6, A;) 6 


where P(x; 6, A;) = Pr {X Ss 2; 0, A;}. Thus it is clear that cases 1, 2, and 3 
are equivalent to assuming that the theoretical life distributions in the various 
sets S; will plot either as parallel straight lines or as the same straight line on 
the semi-logarithmic scale suggested by (44). The results of this paper serve to 
give a procedure for estimating the slope (common slope) of the line (lines). A; 
can be interpreted as the sensitivity limit at the appropriate stress level. 
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NOTES 


APPROXIMATION METHODS WHICH CONVERGE WITH 
PROBABILITY ONE!:? 


By Jutrus R. Bium 
University of California, Berkeley 
1. Introduction and Summary. Let H(y|x) be a family of distribution functions 


a 
depending upon a real parameter x, and let M(x) = | y dH (y | x) be the cor- 


responding regression function. It is assumed M(x) is unknown to the experi- 
menter, who is, however, allowed to take observations on H(y |x) for any 
value zx. 

Robbins and Monro [1] give a method for defining successively a sequence 
{z,} such that 2, converges to @ in probability, where @ is a root of the equation 
M(x) = a and ais a given number. Wolfowitz [2] generalizes these results, and 
Kiefer and Wolfowitz [3], solve a similar problem in the case when M(x) has a 
maximum at x = @. 

Using a lemma due to Loéve [4], we show that in both cases x, converges to 
@ with probability one, under weaker conditions than those imposed in [2] and 
[3]. Further we solve a similar problem in the case when M(x) is the median 
of H(y | x). 


2. Approximation of the root of a regression equation. Let M(x) be the re- 
gression function corresponding to the family H(y | x). Assume M(x) is a Lebes 
gue-measurable function satisfying: 


A, |M(z)|Sc+d\|z\|, 4 £6: 


’ 


B. | ly ~ NWP awe So < a. 


C. M(x) < aforz < 8, M(x) > aforz > 6; 


D. inf U(x) — a| > 0 for every pair of numbers (6,, 4 
8,5 |\2—-0| s82 
with 0 < 6; < 6. < o@. 

Received 1/22/53, revised 7/13/56 

‘This paper was prepared with the support of the Office of Ordnance Research, U. 8 
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2 Kiefer and Wolfowitz had proved the main result of this paper in the Robbins-Monro 
case with bounded random variables. Their shorter proof for this case proceeds from the 
fact that a subsequence of {z,} converges to @ with probability one, and that P{lim inf 
a, <ce<d< lim sup 7,} > 0 for@<e<dore <d< 6, yields an estimate of Da,d 
see equation (9) of [J. Wolfowitz, Ann. Math. Stat., Vol. 23 (1952), pp. 457-461]) which 
implies its divergence. This is a contradiction which proves the desired result 


382 





APPROXIMATION METHODS 
Let {a,} be a sequence of positive numbers such that 


a“ a 
7 2 
(2.1) (a) > an = 0, (b) > a’, < ©, 
n= 1 n=l 
Let x; be an arbitrary number. Define a sequence of random variables re- 
cursively by 


(2.2) Tn+1 = rn + a,(@ — Yn) 


where y,, is a random variable distributed according to H(y | x,). We use through- 
out a special case of Lemma 5.2 in [4]. We state it here as 

Lemma 1. Let |v,} be a sequence of random variables such that 7 es Ev’, < «. 
Then >>)-1 (vj; — E(vj| v1, +++ , vj-1)) converges to a random variable with prob 
ability one. 

Lemma 2. If B and (2.1b) hold, then the sequence {2n4. — >. =i 2,[a — M(x,;)}} 
converges to a random variable with probability one. 

Proor. If we let v; = ajy; — M(a,)] then E{v5} = a5E{(y, M(zx;))°} : 


ajo’. Hence >> %; E{v'.| < «©, by (2.1b), and Lemma 1 applies. Next we show 
2.3) E\yn — M(an) | y: — M(a), +++ 5 Yat — M(an1)} = 0. 

To see this we note that given x, (a constant) we are given M(2,). But given 
yi — M(x) and M(x) we are given 2, , ete. But since E{y, — M(zx,) | r,} = 90, 


i 
(2.3) follows. Thus we obtain that 


n 


2.4) > aj(yj — M(ax,;)) converges with probability one. 


j=l 


But this is clearly equivalent with the statement of the lemma, since 2,4; = 
ry -4- > a;(a _ Yi). 
Lemma 3. Jf A, B, C, and (2.1b) hold, then x, converges with probability one. 
Proor. To begin with we establish 
2.5) P\ lim z, = + ©| + Pilimz, = —o} = 0. 


{z,} is a sample sequence with lim,.,7, = +«.Then we have 


x, <= 8 for only finitely many n. Hence for n sufficiently large we have 
an(a — M(zx,)) < 0 from C. But then lim,.., [tr4,. — Doped aj(a — M(z;))| = 
+. But this can only happen with probability zero, from Lemma 2, establish 
ing (2.5). Now suppose the conclusion of the lemma is false. Then by virtue of 
Lemma 2 and (2.5) there exists a set of sample sequences of positive probability 


For suppose 


with the following properties: 


(a) tna — >, aj(a — M(zx;)) converges to a finite number. 
(2.6) ‘ 1 


(b) lim inf x, < lim sup x, 


for every sample sequence in the set. Let {z,} be such a sequence and assume 
lim sup z, > 6. (A similar argument handles the situation lim sup z, S @.) 
Then we may choose numbers a and ) satisfying 
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(2.7) a> 6, lim inf z, <a <b < lim sup Z,. 


Since lim,..,. 4d, = 0 from (2.1b) and because of (2.6) we may choose N so large 
that N Ss n < m implies 


b-—a \ 
ss 
fa) oe [lal +e +d] 4]); 
(2.8) 
m— bh — 


(b) |tm — 2% — >, aja — M(z})) | < 


inn 


Now choose m and n such that 


(a) Nsn<m 
(2.9) 4 (b) Le < G, la > 0 
(c) n <j < mimpliesa S z; 


This can clearly be done. Then we obtain 


(2:10) ‘ss —c, s 2% + r aja — M(z,)) 


=n 3 


(> — 4) + a,(a — M(z,)) 
since forn <j < m, (2.9) and C imply aj(a — M(zx;)) < 0. If 6 < x, , we obtain 
(2.11) Im — In S (b — a)/3 


which is a contradiction to (2.9). Suppose then that 6 2 z, . Applying A we 
have 


\M(z,)| Sc+d|a,| Sc+d\|60|+d|0—z,| 


Sc+d|60|+ d(tm — 2p). 


(2.12) 
Hence by applying (2.10) we have 


(2.13) Im — In S (b — a)/3 + ,[|a| +o+d|0|] + ad(rm — 2p). 
Thus zt, — 2%, S 2(b — a)/3(1 — a,d) S b — a by (2.8). But this is again a 


contradiction to » (2.9), proving the lemma. 

TueoreM |. [f conditions A through (2.1b) hold, then x, converges to @ with 
probability one. 

Proor. Suppose P{lim, x, = z} = 1, as guaranteed by Lemma 3, and suppose 
further 


(2.14) Pix ¥ 0} > 0. 
Then we may choose «; and ¢, with, say, 0 < «¢ < « < © such that 
(2.15) Pla <2 <e@}>O0. 


(Otherwise we may choose «, and «& with —~* < «& < « < 6). Then for every 
sample sequence {z,} for which lim, z, = x, with g < z < «&, we have 


(2.16) “432% 3 © 
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for all n sufficiently large so that the set of sample sequences {x,} satisfying 
(2.16) has positive probability. But from Lemma 2 and Lemma 3, we have for 
almost all such sequences that 


n 


(2.17) > aja — M(zx,;)) converges. 


j= 
But this is contradicted by D and (2.la), proving the theorem. 


3. Approximation of the maximum of a regression function. Let M(x) again 
be a measurable regression function satisfying B and also 

E. M(z) is strictly increasing for x < @, and strictly decreasing for x > @. 

F. There exist positive numbers p and R such that | 2’ — 2” | < p implies 
| M(x’) — M(x”)| < R. 

G. For every 6 > 0 there exists a positive number (6) such that | z — 6| > 6 
implies 





inf | M (x + €) — M(z — «)| — 


6/2>e>0 € 


(6). 


Let (f,} and {c,} be sequences of positive numbers satisfying 
H. (i) c, + 0; (ii) °F. a, = ©; (iti) S°F1 (an/en)”? < @. 


We define a recursive scheme as follows. Let x, be an arbitrary number. Define 
Tn4+1 = In + (an/Cn) (Yon — Yon—1) 


where Y2, and Yo,; are independent random variables distributed according to 
H(y | a, + en) and H(y | 2, — ¢,) respectively. Then we have 

THEOREM 2. If conditions B and E through H hold, then P\lim z, = 0} = 1. 

The proof of the theorem will be omitted here. It consists in repeating the 
proofs of Lemma 2, Lemma 3, and Theorem 1, with obvious modifications. 

We note that conditions B and E through H represent a weakening of the 
conditions imposed in [3], since conditions (2.5) and (2.8) of that paper are not 
used here. 


4. Estimation of the value at which a conditional median vanishes. 

Suppose H(y | x) is a family of distribution functions such that, for a given 
number a, H(a@ | x) is a measurable function of xz. Assume M (z) is the (not neces- 
sarily unique) median of H(y | x). We assume the following conditions on M (x) 
and H(a| zx): 


(4.1) M(x) < aforz < 86, M(x) > aforz > @. 


By this we mean that if z is less than 6, then every median of H(y | x) is less than 
a, and similarly if x is greater than 6. 


(4.2) inf | H(a|z) —4| > 0 


815 |2—-0| S82 


for every pair of numbers (6; , 52) with 0 < 6; < & < «. Let {a,} be a sequence 
of positive numbers such that 
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an “ 


(4.3) > a. = 0, Ym a, . 


te 1b n=l 


Define a recursive approximation scheme as follows. Let x, be arbitrary and define 


(4.4) Xn4i = In + Anka 


where z, = +1 ify, S aandz, = —1 if y, > a, and y, is a random variable 
distributed according to H(y | a,). Then, by applying Theorem | with a = 0 
and y, = —z, , we obtain 
‘THEOREM 3. If conditions (4.1), (4.2), and (4.3) hold, then P\lim x, = 6} = 1. 
I should like to thank Mr. Lucien LeCam for many helpful discussions con- 
cerning this problem. I should also like to thank the referee for pointing out that 
the condition of uniform boundedness of M(x) in Section 2 could be replaced by, 
the present condition (2.1). 
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A NOTE ON THE ROBBINS-MONRO STOCHASTIC APPROXIMATION 
METHOD' 


By GoprnatH KALLIANPUR 
Institute for Advanced Study and University of California, Berkeley? 


Introduction. The almost certain convergence of the RM process and related 
stochastic approximation procedures is proved by Blum [1] in a paper appearing 
elsewhere in this issue. In the present note we consider the method originally 
proposed by Robbins and Monro [2] with a further restriction on the constants 
a, . Our aim is to obtain, by elementary methods, an estimate of the order of 
magnitude of b, = E (x, — 6). This estimate is sharp enough to enable us to 
prove strong convergence for certain types of sequences a, . The method adopted 
in [1], while being more general, does not yield information about the behavior 

Received 1/19/53, revised 12/7/53 

‘ This work was begun with the partial support of the Office of Naval Research when the 
author was at the Statistical Laboratory, University of California, Berkeley 

? At present at the Indian Statistical Institute 





STOCHASTIC APPROXIMATION METHOD 387 


of 6, for large n. Using the notation and assumptions of [2] we state our results 
in the following 

Tueorem. Let Gin’ < a, S Gn™ for all n, where G, and G, are positive con- 
stants and 24 <6 S 1. If either 
(i) 24 <6 < l,and 


‘ ] 
(1 ) K a (G2/Gi)(C -+- a l), 
or (ii) 6 = 1, and 
(2) K > 2(G./Gi)(C + | @}), 
then 


(3) P(lim 2, = 6) = 1. 


n—?o 


The appropriate estimates for the order of magnitude of b, are given by (9), 
(15), (16) and (18) below. 


Proof. We shall briefly indicate the proof of (i). Let r be a positive constant 
less than (2KG,; — (C + | @|)G.)/(1 — 6), andlet A = (C + | a\)G,/(1 — 8). 
Then using (21) of [2] and the inequality 

L+o°4...4 7 a pa — $) (for all 7 large enough), 
we have 
(4) A; = (A +1) ” ite (for j sufficiently large). 
From (4) and the easily verifiable relations [2] 
bjar = 0; — 2ajd; + aie;, d; = b;K/A;, and e; S (C+ |\a|)’ < @ 
we obtain 


—26 


(5) bias S biQ5 + My (j 2 m), 


where gj = 1 — B-j', and B = 2KG,/(A +r) > 1 — 6 by the choice of r. 
Here and in the sequel the letter M with or without a suffix denotes a constant 
independent of n. Putting 7 = m,m + 1, --- , n successively in (5) and setting 
Q, = ie nq and Rk, = 1+ m”Q,) + --- + n-"Q;', we have 


(6) ha S MO. 


: —! 
Also, since yo jar5 ~ log n, 


(7) Q, = O(n~*). 


For the estimation of FR, we consider three different possibilities: 
a) B > 26 — 1. Easy computation shows that 


(8) R, = O(n'*®-*) 


which together with (6) and (7) leads to the result 





. 
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(9) by = O(n), 

Now choose 8 to satisfy 

(10) (26-1)? <B< (1 —8)", 


with 26 — 1 > 1 — ésinceéd > 24. Fork = 1,2, --- define the subsequence m = 
(k”|. Then Tchebycheff’s inequality yields the simple estimate 


(11) P (|an, — 0| > 6—) = O(k°*"). 


Since 6(26 — 1) > 1 from (10), using (11) and applying the Borel-Cantelli lemma, 
we have 


(12) lim 2t,, = 0 


ka 
with probability one. For n S n < mai, 
hk +1 bis 
(13) |a, —0| S|a,—2n,|+|2,, -0| SM > j + [24-0]. 
june 
Since M41 — % Ok) 


(14) SS 5 = OK) = o(1) 


jm Nh 


as n and hence k tends to infinity. The last remark follows from the right side 
of (10). Relations (12), (13) and (14) establish (3). 
b)1—6 < B < 25 — 1. In this case, since 26 — B > 1, R, = O(1) from 
which we obtain 
(15) b, = O(n”). 
c) B = 26 — 1. This gives R, = O(log n) and 
(16) b, = O(n” -log n). 
Combining b) and c) we may write 
(17) b, = O(n™) 


where it is understood that 4 = B in case b) and 1 — 6 < yw < B in case c). 
The rest of the proof is the same as in a). 
In the proof of (ii) similar computations give the following estimate for b, : 


(18) b, = O (log n)™, where np = KG,/(C + | a@|)G. — 1. 
The proof of (3) is accomplished by taking n, = [exp a (uo <B <1). 
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A PROPERTY OF THE NORMAL DISTRIBUTION 
By Evcenr Lukacs anp EpGar P. Kina! 
National Bureau of Standards 


1. Summary. The following theorem is proved. 

Let X,, X2,---, X, be n independently (but not necessarily identically) 
distributed random variables, and assume that the nth moment of each 
X(t = 1, 2, --+ , n) exists. The necessary and sufficient conditions for the exist- 
ence of two statistically independent linear forms Y,; = 7 a,X, and Y, = 
> rai). X, are: 

(A) Each random variable which has a nonzero coefficient in both forms is 
normally distributed. 

(B) Shiabor = 0. 

Here a; denotes the variance of X, (s = 1, 2, ---, 7). 

For n = 2 and a, = bh} = a, = 1, bp = —1 this reduces to a theorem of 8S. 
Bernstein [1]. Bernstein’s paper was not accessible to the authors, whose know!- 
edge of his result was derived from a statement of 8. Bernstein’s theorem con- 
tained in a paper by M. Fréchet [3]. A more general result, not assuming the 
existence of moments was obtained earlier by M. Kac [4]. A related theorem, 
assuming equidistribution of the X; (i = 1, 2, --- n) is stated without proof in 
a recent paper by Yu. V. Linnik [5]. 

2. Introduction. We consider two linear forms 


n 


(1) Y.e > ak., Yeu 2b 


sal s=l 


in the nm independently distributed random variables X,, X.,--:-, X,. We 
arrange the variables so that the first p (X,, X2,--- , X,) have nonzero coefti- 
cients in both forms and the remaining (n — p) have zero coefficients in one form 
or the other. Clearly 0 < p S n. When p = 0, Y, and Y, are trivially independent; 
when p = 1, ¥; and Y, cannot be independent. For p 2 2, it is clear that the 
statistical independence of the original linear forms (1) is completely equivalent 
to the independence of the forms Z; = > ale X, and Z, = > de: This 
means that when p < n the distributions of the random variables X,4;, --- , Xn 
do not affect the independence of Y; and Y,. This is why the theorem contains 
only a statement about the distributions of those random variables with non- 
zero coefficients in both forms. 

If for some pairs of corresponding coefficients, say the first r (1 <r < p), 
the relation 


(2) a;/b; = Qo/bp = +-- = a,/b, = C 
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holds, then we can rewrite Z; and Z» as 
Z, = CCX, + +++ + bX,) + apy X ry + +++ + aX,, 
Z. = DX, + ++ + OX, H+ Dp Xe41 + es + O,X,. 


Introducing the new variable X, = bX, + --- + b.X,, we see that the inde 
pendence of Y, and Y> is equivalent to the independence of the forms Z, 
CX, + auskens +: 4 a,X » ant Zw A, + bakes +.*<* 4+ eae 
the theorem holds for the forms Z, and Z, , Cramér’s theorem [2] shows that the 
normality of X, implies the normality of the random variables X,, X.,--- , X,. 
We proceed in the same manner if there are several groups of random variables 
for which a relation of type (2) holds. Hence our problem reduces to the study 
of the independence of two linear forms whose coefficient matrix contains no 
vanishing minor of order 2. 

Finally it is clear that the independence of Y; and Yo» is equivalent to the 


independence of the forms v. = D> cwttte(X — E{X,]}) and 
Y. = 2 is b,(X, — E[X,)). 


Therefore we shall assume without loss of generality that the following conditions 


are satisfied: 
(i) a,b, 
(ii) a,b, — ab, + for alls ~ t; 8, ¢ 


(iii) E[X,] = 0. 


3. The functional equation for the characteristic functions. Denote the 
distribution function of the random variable X, (s = 1, --- ,n) by F(a) and the 
corresponding characteristic function by f,(0). Let h(u, v) be the c.f. of the joint 
distribution of Y; and Y» and write hi(u) = A(u, 0) and ho(v) = A(O, v). Clearly 
h,(u) and ho(v) are the c.f.’s of the distributions of Y; and Y>2 , respectively. 

We prove first that our conditions are necessary; that is, we assume that 
Y, and Y> are statistically independent. In terms of characteristic functions 


this means 
(3) h(u, v) = hy(u) he(v). 


Further, because X,,--- , X, are independent, we have 


(4) hy(u) = II f(a, u), 


o=l 


hv) = I] f.(0, »), 
sal 


n 
h(u, v) = II f.(a u+ bi 
=1 
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Finally, substituting (4), (5), and (6) in (3) we obtain the following functional 
equation in the characteristic functions: 


(7) II f.(a. u+b,v) = II f(a, u)f.(b, v). 
sen) 


em l 


4. The differential equations for the cumulant generating functions. ‘Ihe 
general procedure for determining the explicit form of the characteristic functions 
J,() will be to differentiate the logarithm of (7) r times (r = 1, 2, --- , n) with 


=. 


respect to u, set uv = 0, and solve the resulting n differential equations for In f,(t) 
(s = |], - , n). 

We first note that f,(0) = 1 (s = 1, --- , n) and that /,(¢) is a continuous func- 
tion of ¢. Therefore there exists a neighborhood of the origin in which all the 
factors occurring in (7) are different from zero. This neighborhood could of 
course be the entire plane. In the following derivation we restrict the values of 
u and pv to this neighborhood; then we may take the logarithm of both sides of 
(7) and obtain 


(9) - ¢,(a,u + b,v) = Zz ,(a,u) + _ o,(b, v), 
gun) gen] 


eam] 


where ¢,(x) = In f,(x). Differentiating (9) r times with respect to u and setting 
u = 0 yields 


du’ 
Letting z, = a,u, we find that the typical term on the left side of (10) becomes 


a! ar 
(11) E o, (a, U 4 ba) | i = ay ; o, (2, ab bs) | 


dz) 


s=] om] 


um 


n ar nn r 
(10) = Pe d.(a,u + b, | =" - |Z ¢,(a, )| 


za=0 


With the substitution ¥,(v) = ¢,(b,v), (11) becomes 


[ 9’ a y d’ 
2 ( b = toe — (v). 
(12) | our o,(a,u + | (s a V,(v) 


Similarly the typical term on the right side of (10) becomes 


(13) | : (a, | = a, $ o.(2) | va (ia,) ners 
du’ uae) dz z,~0 


where x," is the rth order cumulant of X,. Substituting (12) and (13) in (10) 
we obtain 


n n 


' oi so seal 
(14) X & — W(x) = DL (ia,)’«’ 
av 


s=l 1 e= 


where & = a,/b, . Differentiating (14) (n — r) times yields the system of differ- 
ential equations 
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Set 


_ 
ex) i 


(15) 


p> eS -V, (v) = : (ia,) "x; 


We have to determine all the distribution functions whose characteristic func- 
tions satisfy this system of differential equations and the initial conditions 


|< v0) | = (ib,)'xS? 
(15a) dv’ -0 
W,(0) = 


We now define 


. 
w 
Sn 


and denote by D,,, the cofactor of the element in the sth column and the nth 
row of D, . Considering (15) as a system of n linear equations in the quantities 
d"v¥,(v)/dv", we obtain the solutions 


(16) < V,(v) = Je. = > (ia,)” Ke = i*( im ’ Say. 
vy” 


d dn wl 


Integrating (16) n times and employing the initial conditions (15a) yields 


n—l /- 3 ’ 
b, s) C, n \n 
V,(v) = 7 (xb.) Pe "yg" dp aot~ (50). 


j=l }! n! 


Since f,(bv) = exp[d,(b,v)] = exp[¥,(v)] we have 
ty a es 

(17) f.(b,v) = exp be —— (ib,v)’ + hey (ib, |. 
j=l 7! Delt. 


In case any of the functions f,(t) become zero for some real /, this solution 
is valid only in a certain neighborhood of the origin. We next show by an indirect 
proof that none of the functions f,(¢) (s = 1, --- , n) has a real zero; from this we 
can conclude that (17) is valid for all real ¢. 

Let us therefore assume that one or more of the c.f.’s f,(¢) have ze ros. In this 
case at least one of the functions f,(b,v) will have a zero. Denote by vy the zero 
closest to the origin and by f,(t) a function for which f,(b,v¢) = 0. For |v | < | 0? | 
we have f,(bw) ~ 0 (s = 1, +--+ , m) and formula (17) is valid. Let v be a real 
number such that | v | < | v; |; then we have by (17) 

n—1 Pa 


(18) f-(bpv) = exp be i (ib, v)? + Fe * (ibes)* |. 


j=l 
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But f,(t) is a continuous function. Hence lim,.,°f,(bv) = f-(bv:) = 0 by assump- 
tion. However, from (18) it is clear that 
m—1 (mn) c 5 
lim f,(b,v) = exp | FT (ib, v?)? + bal io) 
v-*v, i= . rity 


which is always different from zero. This is a contradiction, and hence formula 
(17) is valid for all values of v. Writing ¢ = b,v we finally obtain 


n—1 (a) Y 
(19) f.(Q) = exp |= oe (it)? + Gow wy. 
j=1 J! bn! 

5. Proof of the theorem. We have determined all the solutions of the system 
(15) satisfying the initial conditions (15a). In order to find the distribution 
functions whose characteristic functions satisfy this system we must select 
those functions (19) which are characteristic functions. This is easily done by 
means of the following result [6]. 

Theorem of Marcinkiewicz. No function of the form e*****"'**"*" (r > 2) can 
be a characteristic function. 

Hence the degree of the polynomial in (19) cannot exceed 2. In case n > 2 
we must have 


kj? = 0 j = 3,4,---,n—1;8=1,2,--- , n3jn>3 
Can = 0 s=1,2,---,nsn>2. 


Because the factor D,,,/D, in C,,, cannot vanish, these relations reduce to 


[xj = 0 j=3,---,n—1,; n>3 
20) \ 
pe ark” = 0 n> 2. 


a=) 


There is no restriction if n = 2. In view of (iii), xi" = 0 also, and (19) becomes 
(21) f(t) = exp[—}o30’] n> 2. 


This shows that each X, (s = 1, --- , n) must be normally distributed, which 
is condition (A) of the theorem. All cumulants of order r > 2 vanish for a normal 
distribution, hence equations (20) impose no additional restrictions. In case 
n = 2 we have 


f(t) = exp[—4kt’] 


where k is determined from (16) and (19). The independence of Y,; and Y, im- 
plies that they are uncorrelated which yields condition (B) and completes the 
first part of the proof. 

It is easy to establish that conditions (A) and (B) are also sufficient. Assuming 
that (A) and (B) hold, it follows that Y; and Y, are uncorrelated and normally 
distributed. Hence Y; and Y, must be independent. 
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Korn = 2 and a, = az = b; = 1, be = —1 we obtain from (22) 
f(t) = exp [— (oi + o2)t"/2] 
This shows that 0, = o2 and establishes Bernstein’s theorem. 
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ADDENDUM 


The authors are indebted to Professor G. Darmois for calling their attention to 
his note in the C. R. Acad. Sci. Paris, Vol. 232 (1951), pp. 1999-2000 in which he 
proved the theorem for n = 2 without assuming the existence of moments. He 
later extended this to the case of arbitrary n. His paper will be published in the 
Bulletin of the International Statistical Institute. The method of proof used by 
Professor Darmois is different from the one presented in this paper. The authors 
learned that these results were also obtained by methods similar to Darmois’ 
by B. V. Gnedenko (Jzvestiya Akad. Nauk. SSSR, Ser. Mat., Vol. 12 (1948), 
pp. 97-100) for the case n = 2 and by V. P. Skitovich (Doklady Akad. Nauk. 
SSSR (N.S.) Vol. 89 (1953), pp. 217-219) for any n. 

While reading the proofs of this paper the authors learned that the theorem 
was also discussed by M. Loéve in the appendix to P. Lévy’s “‘Processus stochas- 
tiques’’, Gauthier-Villars, Paris, 1948, pp. 337-338. 
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ON OPTIMAL SYSTEMS! 


By Davin BLACKWELL 


Howard University 


1. Summary. For any sequence 2,, 22, °°: of chance variables satisfying 
la, | S 1 and H(z, |, +--+, tau) S —u(max| 2, || a1, -** , Lan), where u 
is a fixed constant, 0 <u < 1, and for any positive number ¢, 


iL - or 
Pr { — (ay 4 ereg “+ In) <— t} < C=) . 
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Equality holds for integral / when x; , 2, --- are independent with 


Pr {z, = 1} = (1 — u)/2 Pr {z, = —1} = (1 + u)/2 


, j “~. 


This has a simple interpretation in terms of gambling systems, and yields a 
new proof of Lévy’s extension of the strong law of large numbers to dependent 
variables [2], with an improved estimate for the rate of convergence. 


2. The theorem and its interpretation. We consider a gambling house which 
will play any game named by the customer, provided that (1) the customer’s 
maximum gain or loss does not exceed one unit and (2) the customer’s expecta- 
tion does not exceed —ug, where g is his maximum gain or loss. A customer 
with unlimited credit wishes to devise a system of play which will maximize 
his probability of eventually becoming at least / units ahead, where ¢ is a fixed 
positive number. Thus a system is a sequence x; , 2%, «+: of chance variables 
satisfying 


(1) 
(2) E(2,|%1,°** , 2a) S —u(max|z,||a, °** , Lent): 


A particular system is obtained by letting x, , 22, --- be independent, with 
Pr {x, = 1} = (1 — u)/2 and Pr {z, = —1} = (1 + u)/2. For this system, 
it is known ({1], p. 290) that 


(3) Pr }max (x1 +--+» +7,) 2t} = ( —_ ‘) 


1+u 


n 


for any positive integer ¢. Our theorem is that this is the best system in the 
sense of maximizing the probability of eventually attaining /, that is, we shall 
prove the 

THeoreM. For any system x, %2, «++ satisfying (1) and (2), and any positive 
number t, 


‘ 
Pr {(ai + --: +2, 2 t for some n} S 
1+ 


Proor. For any real number ¢ and any system S, let 


o(N,S,0 = Pr} max (a7, +--+ + a) 2 4}, O(N, t) = sup @(N, 8, 0). 


O<ken 
In particular (0, S, 4) = 1 fort Ss 0, = 0 fort > 0. We shall show that 


(4) o(N + 1,t) = sup F¢(n, t — 2), 


ziyx 


where X consists of all chance variables x satisfying |x| S 1 and Ex S —u 
max |aj|. Actually (4) is intuitively clear; it asserts that, to maximize the 
probability of reaching ¢ during N + 1 trials one must, for each value of 2,, 
use that system in the remaining N trials which maximizes the probability 
of attaining the new required sum ¢ — x; and one must choose x; so that the 
average probability of attainment in the remaining trails is maximized. 
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To avoid tedious measurability difficulties, we remark that we may restrict 
attention to those systems for which (a) each z, assumes a finite set of values, 
all rational, (b) z, = 0 for sufficiently large n, and (c) the probability of any 
particular sequence x; , x», :-- is rational. Denote by $ the countable set of 
systems satisfying (a), (b), and (ce). 

Now any system S ¢ § is described by specifying the initial variable x, and 
for each value v of x; , the system S(v) ¢ $ to be used thereafter when x, = v. 

We have, for Se 8, t > 0, 

(5) o(N + 1, S,t) = Eep(N, S(x), t — x), 
so that 


(6) (N+ 1,8,0 Ss Eo(N,t — 1) S sup LG(N,t — 2). 
aey 
Taking the sup over S « § in (6) yields 


(7) o(N + 1,4) Ss sup E@(N,t — 2). 


rey 


On the other hand, (5) yields 
(8) Eeo(N, S(1),t — 1) S O(N + 1, 0). 


For a fixed initial variable x, , allowing S(2) to range over all S ¢ 8, independ- 
ently for the different values of x, , yields 


(9) E¢(N,t — 1) S O(N + 1, 2). 
Since any z ¢ X is an admissible initial variable, from (9) we obtain 


(10) sup Eg(N,t — xz) S O(N + 1, 0. 

zeyxy 
Inequalities (5) and (10) yield (4) for ¢ > 0; for ¢ S 0, (4) is obvious, since 
o(N + 1,t) = land E@(N,t — xz) = lforz = 0. 

To continue the proof of the theorem, we consider the transformation U, 
taking Borel-measurable functions of ¢ into Borel-measurable functions of ¢, 
defined by 
(11) Uf@® = sup Ef(t — 2). 

zex 
Equation (4) asserts that Ug(N, t) = o(N + 1, t). We verify, for g(t) = 
[1 — u)/(1 + u)]‘, that Ug = g. To see this, fix 4 and d, with 0 < d < 1, and 
let h(t) be the linear function of ¢ with h(t — d) = g(lo — d) and h(&4 + d) = 
g(to + d). Then for any x e X with sup |z| = d 


Eg(to — x) S Eh(to — x) = h(ly — E(x)) S hl + ud), 


with equality if and only if x assumes only the values +d, —d and E(x) = —ud. 
Now 


g\to + d) * glo — 


2d 


h(t) = glo —@d + d) (t — t + d), 
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so that 
r(d) = h(lo + ud) = g(lo)[$1 + u)g(d) + 4(1 — u)g(—d)). 


Since r(0) = r(1) = g(t) and r is convex in d, r(d) S g(t) for alld, Eg(t) — x) S 
g(to) for all x e X, with equality only for x = 0 and x = +1 with probabilities 
4(1 + u), and Ug = g. 

To complete the proof of the theorem, we note that f; S fe for all ¢ implies 
Uf; S Ufe for all t. Since (0, t) = 1 fort S 0 and 0 for t > 0, (0, t) S g(t) for 
all t. If @(N, t) S g(t) for all ¢, applying U yields 


Ud(N, t) = o(N + 1,8) S Ugit) = gd 


for all ¢ so that, by induction, ¢(N, t) S g(t) for all t, N. Consequently 
limy..«@(N, t) S g(t). But for any system S, . 


Pr {x; + --- + 2, 2 t for some n} 
= lim,.« O(N, S,t) S limy.e d(N, t) S g(t) 


= ((1 — u)/(1 + w)]’, 
and the proof is complete 
Coro.uary. If 2, %2, «+: satisfy |a,| S 1 and E(x, | am, --- 


then (41 + +++ + &n)/n — 0 with probability 1; in fact 
at ::: + Zz) 


(12) Pry —| 2 efor somen 2 
nm | 


f 


Proor. Pr 4 


(xy + --- + 2a) 


nm 


= efor some n 2 


< Pr {(x, — ¢€/2) + --+ + (a, — €/2) 2 €N/2 for some n} 


< ( 1 er 
~ AL +e 


where the last inequality is obtained by applying the theorem to the sequence 
Yn = (a, — €/2)/(1 + €/2), with t = eN/(2 + e€). The same inequality holds 
for Pr {(a + --: + 2,)/n S —e for some n 2 N}, and the corollary follows, 
The part of the corollary on convergence with probability 1 is due to Lévy 
({2], p. 252). However, his method of proof does not yield a geometric rate 
of convergence in the sense specified by (12). 

Added in proof. T. KE. Harris has kindly called my attention to a result of 8. 
Bernstein (see J. V. Uspensky, Jntroduction to Mathematical Probability, McGraw- 
Hill Book Company, Inc., New York and London, 1937, pp. 204-205, problems 
12-15), which yields a geometric rate for independent variables under conditions 
weaker than uniform boundedness. Moreover, for the case of independent z, 
with | z,| S 1, Bernstein’s rate is slightly better than that given here, having 
an expansion r = | — (e/2), (e/6) + --- as compared with r = 1 — (é&/2) + 
(e/2) + --- for the rate given here. 
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DENSITY UNBIASED POINT ESTIMATES 
By Raymonp P. Petrrerson 


University of Washington and University of California, Riverside 


1. Summary. A new concept of unbiasedness (density unbiasedness) for point 
estimates is introduced and the ‘‘best’”’ density unbiased point estimate for the 
mean of any normal distribution is proved to be the ordinary sample mean 


# = DO, 2,/n. Under certain conditions on the form of the characteristic func- 
tion of a family of probability density functions involving an unknown location 
parameter, % is shown to be a density unbiased point estimate of the location 
parameter. 


2. Introduction. Let 2,,--:,2, denote n (not necessarily independent) 
random variables each of which is distributed over a space M according to a dis- 
tribution function P(x, 0, 6, --- , 6,). It is assumed that P(x, 6, 0, --- , 4) 
is completely specified except for the s + 1 parameters 6, 6,,---, 6,. These 
parameters may be represented by a point (@, ¢) in the (s + 1)-dimensional 
parameter space ,,, where o = (0,,--- , 6,). AlsoX, = (4%, ---,2,) isa point 
in the n-dimensional sample space M,, . We shall assume here that P(z, 6, c) is 
absolutely continuous and denote the corresponding probability density func- 
tion by p(z, 6,0). Let p,(X,, 4,0) = pr(ti, -** , Xn, 8,0) denote the joint prob- 
ability density function at the point X, ¢M,. 

A statistical point estimate of the parameter @, which ranges over a subset 
w of the real line, is a function f(X,) of the sample values 2, , --- , 2, , whose 
range is the same subset w. 

Let & be the (s + 1)-parameter family of probability density functions 


P(x, 0,0) = plz, 6,0, --- , 8). 


The mean probability density function of § generated by f(X,) relative to @ is given 
by 


(2.1) ¢,(«, 0,0) = | p(x, f(X,), ¢) pal Xn, 0, 7) dX,. 
M, 


It is readily seen that g,(z, 6, ¢) is a probability density function provided 
p(x, {(X,), 7) is measurable in X, over M,,. 
A point estimate f(X,,) of @ will be called density unbiased if 


g(x, 9,0) = p(x, 8, o’), 


where o’ is some value of o. There are various criteria by which we might choose 
a “best” density unbiased estimate from the class of all density unbiased esti- 
mates, provided this class is not empty. We shall call an estimate f(X,) of 6 
a best density unbiased estimate if 

i) f(X,) is a density unbiased point estimate of @, and 
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ii) (¢’ —o) = ‘1 (6, — 0;)° is minimized by f(X,,) with respect to all den- 
sity unbiased estimates of @. 


3. The best density unbiased point estimate for the mean of a normal popula- 
tion. Let ¥ denote the two-parameter family of probability density functions 
] ' Sr 
p(z,0,0) = ——e*rn® 
oV 24 
and let f(X,) be any point estimate of 6 where x2; , --- , 2, are n independent 
sample values of x. Now f(X,,) will be density unbiased if and only if g(a, 6, «) 
possesses a characteristic function of the form 


(3.1) orn 


THEOREM 3.1. An estimate f(X,) is a density unbiased estimate of the mean 0 
of a normal distribution if and only if f(X,,) is itself normally distributed with 
mean @. 

Proor. Taking the characteristic function of both sides of the identity (2.1) 
we have 


a 


y, (1, 8, «) - | eo (x, 0, 0) dx 


. | et f p(z, f(X.), ¢) pa(Xa, 0, 0) aX, dz 
* M,, 


| gener IX, 6, o) ae: 
M 


n 


However, W(t, 6, ¢) must be of the form (3.1). Therefore 


(3.2) / o'r, oe (x... 6,0) dX, = —— ee 


n 


On the other hand 


(3.3) | ~~ PrlXn, 6,0) dX, = / eal, 6, 0) df 
M,, — 8 


where g(f, 6, c) is the probability density function of f(X,). Since the right 
hand side of equation (3.3) is the characteristic function of f(X,,), it follows 
from (3.2), (3.3) and the uniqueness theorem for characteristic functions [1] that 
f(X,,) must be normally distributed with mean @ and variance o” — o’. 

It follows from Theorem 3.1 that in looking for a best density unbiased es- 
timate of @ we can restrict ourselves to the class Ny of normally distributed 
estimates with mean 6. Now, the ordinary sample mean Z is known to be normally 
distributed with mean @ and minimum variance o°/n among all estimates with 
mean 6, subject to certain regularity conditions which are satisfied by all es- 
timates in Ny. Therefore o” — o’ is a minimum for t(X,.) = 2. 
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a ae as 2 oe ; : ae 

Since o’ > o, and Z minimizes o’ — a’, it immediately follows that # mini- 
mizes (o’ — o) with respect to the class of all density unbiased estimates of @. 
Hence £ is the best density unbiased estirnate of the mean @. 


4. Density unbiased point estimates of a location parameter. Let 5 denote 
the two-parameter family of probability density functions 


p(x, 0,0) = -»(2—*), ao>O0 
o o 
and let 2,, --- , 2, denote n independent observed values of x. In this case 6 
is called a location parameter and o a scale parameter. 
LemMa 4.1. The characteristic function P(t, 6, 7) of p(x, 0, 7) is of the form 
V(t, 0,0) = e'"h(to), where h(tc) = ¥(to, 0, 1). 


PROOF. 
Vit, 0,0) =; I o(* —*) dx = ' l e'**p(2) dz 
0 a 0 « 


where z = (a — 6)/o. If we let h(tc) = / e'**»n(z) dz, the lemma follows. 
— 2 


TueroremM 4.1. If h(to;)h(to.) = h(tos), where a, , a2, and oz are values of o 
independent of t, then & is distributed according to a member of &. 
PROOF. 


a ® 


E(e'”) =| e*" »,(X-, 0, o) dX, = Il i] ei!" o(a2;, 0, 0) dx; 
(4.1) ‘ jai Jw 


= e'"[h(te/n)\" = e'hito’). 


Hence by Lemma 4.1 the probability density function of Z belongs to 5. 
THeoreM 4.2. Jf h(to,)h(to.) = h(to3), where o;, o2, and o; are values of o 

independent of t, then & is a density unbiased estimate of the location parameter @. 
PROOF. 


«® 


y(t, 8,0) = | e'"p-(x, 0, 0) dx oo | p(x, E, 0) pn(Xn, 0,0) dX, de 
« 0 My, 


ad | e'*h(tc)pa(Xn, 0, 0) dX, = h(to)E(e"). 
My, 


From (4.1) we have E(e™’) = e'“h(to’) and so 
¥;z(t, 0,0) = e“h(to) h(to’) = e'“h(to’ ). 


Therefore ¢;(x, 0, «) ¢ ¥ and the theorem follows directly. 

Examp_Le. Let $ denote the two-parameter family of Cauchy distributions 
given by 
] l 


x, 6 = — 
p(x, 8, «) wo 1 + (x — 6)?/o? 


’ 
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In this case the characteristic function y(t, 0,0) = e@*!'"' 


—a\t| 


, so that A(tc) = 
= ¢ “". It is immediately verified that h(to,) h(te2) = h(to;), where o; = 
$(a; + o2). Therefore, by Theorem 4.1, # has a Cauchy distribution. In fact, 
since in this case E(e"”) = [h(te/n)|" = h(to), € has exactly the same distribu- 
tion as does z itself, namely p(x, 6, 7). It follows from Theorem 4.2 that Z# is a 
density unbiased point estimate of the location parameter 86. 

It is readily seen that Theorems 4.1 and 4.2 are valid also when Z is replaced 
by any linear homogeneous estimate Dim a,x; , where yr a; = 1. 
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SOME PROPERTIES OF BETA AND GAMMA DISTRIBUTIONS 


By M. V. JAMBUNATHAN 
University of Mysore 
1. Summary and Introduction. The object of this paper is to present certain 
important properties of the Gamma distribution and the two kinds of Beta 
distributions, and to indicate certain useful applications of these two to sampling 


problems. The distribution of the Studentised D?-statistic under the null hy- 
pothesis is obtained in two different ways. 


2. The Gamma distribution. If a random variable x has probability density 


l 


2.1) 
( I'(a) 


then x is said to have a Gamma distribution; furthermore, x is called a Gamma 
variate with parameter a, and is symbolically written y(a). The Gamma dis- 
tribution is known to possess, among others, the mean conserving property 
(m.c.p.), provided the variates are independent.’ Symbolically 


y(a) + y(b) = y(a + b). 


If « is a Gamma variate with parameter a, then 2z is distributed as x° with 2a 
degrees of freedom. 


3. Beta distribution of the first kind. A random variable having the proba- 
bility density 


] 
Bla, b) 
Received 9/25/52, revised 1/20/54 
' This necessary proviso which was left out in the original paper has since been inserted 


at the suggestion of a referee to whom the author is thankful for several other suggestions 
besides 


(3.1) on. a 


, 
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is said to have a Bela distribution of the first kind, and x is called a Beta variate 
of the first kind with parameters a and b, written symbolically 8,(a, b). 

Under certain mild conditions, the product of two independent Beta variates 
of the first kind also follows the same distribution, as stated in the following 
theorem. 

THroremM |. [f x and y are two independent Beta variates of the first kind with 
parameters (a, b) and (a + b, c), then the product xy is a B,(a, b + c) variate. 

The theorem is proved by transforming variables, the transformation being 
u = ry,v = (y — u)/(L — u). When this is done, u and v are independent varia- 
bles and u is a B,(a, b + c) variate, v is a B,(b, c) variate. 

The result of the theorem can be generalized, as shown by Rao [2]. We then get 

‘THEOREM 2. If a, 2, +++ , &» be p independent Beta variates of the first kind 
with parameters (a;, b;) for i = 1, 2, ---, p, and if ainn = a; + OD; for 
l 1,2, --:,(p — 1), then the product x2, +++ x, is a B, variate with parameters 
a,and b = zt 1 0;. 

‘This theorem is proved by the repeated application of Theorem |. The proper- 
ties of Gamma variates in conjunction with the results of Theorems | and 2 
lead to the following 

THeoreM 3. /f 2; , y; fori = 1, 2, --+ , p be independent Gamma variates with 
parameters a; , b; which are connected by the relation aj,, = a; + 6;, fori = 
fa °°*. wD 1), then the product 12. x;/(x; + y;) is a B, variate with pa- 
rameters (a, , b) where b = 7 1 Oy. 

This may be expressed symbolically as 


y (ay) ; y (ay + by) yay + by + ° . + b, i) 
y(a; +b) ylay + by + In) y(a, + by + °° + Dy) 


(a 
- Ba, > b,) = Vian) 
tow | 


yla,y + by + +++ + bp) 


Since the variates are independent, and have Gamma distributions, we note 
that x,/(2; + y,) is a B,(a,;, b;) variate. Application of Theorem 2 establishes 
the result. 


4. Beta distribution of the second kind. A random variable z is said to have 
a Beta distribution of the second kind with parameters a and b if « has the proba- 
bility density 


a~—l 
r 


Ss 2 < 
Bia, b)\(1 + x)*t*’ = 1 


and is symbolically written 6,(a, b). 

The two kinds of Beta distributions are connected by a transformation. 
If x is a B,(a, b) variate, then u = (1 — 2x)/x is a B.(b, a) variate. Conversely, 
if uw is a B.(b, a) variate, then « = 1/(1 + wu) is a B,(a, b) variate. This mutual 
relationship enables us to deduce a new result connecting the two distributions. 

THrorem 4. Jf u = (1 + y)/(1 + 2), and if u is a By(b — d, d) variate while y 
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18 a (a, b) variate, then x is a B.(a + d, b — d) variate, provided that u and y are 
independent. 

Since y is a B2(a, b) variate, 1/(1 + y) isa B,(b, a) variate. Therefore 1/(1 + x2) = 
u{1/(1 + y)] is the product of two 6, variates with parameters (b — d, d) and 
(b, a), and is, therefore, by Theorem 1, equal to a 8,(b — d, a + d) variate. 
Hence x is a B.(a + d, b — d) variate. 


5. Distribution of Studentised D’* under the null hypothesis. The results 
deduced in the preceding sections enable us to obtain easily the derivation of 
the distribution of the Studentised D*-statistic. Consider two independent random 
samples of size n,; and nm, drawn from two multivariate normal populations. 
It is assumed in what follows that the two populations possess the same co- 
variance matrix A = || o;; || for i,j = 1,2, --+,p. 

Let n = m + nm and 1/c = 1/m, + 1/n2. Also let #5" and #\” be the mean 
values of the 7th character for the first and the second samples respectively. 

Let S,;; be the pooled corrected sum of products within the two samples for 

v(1) 


the variates z,; and 2; ; that is, S;; = Si; + S‘’, where 


ni 
(5.1) Si; = > ¥ (xy — #f?)(x$? — #5”), 

ral 
the upper suffix (1) indicating the first sample; S‘;’ is defined similarly. The 
statistic s;; = S,,;/(m — 2) is an estimate of o,; , the covariance between x, and 
x; in the two populations. If (S"’) be the matrix inverse to (S,;), and (s'’) the 
matrix inverse to (s,;), we might call (s'’) the estimate matrix of the covariances. 
The statistic appropriate for testing the significance of the difference between 
the means of the several characters is given by 


Pp dp 


(5.2) D, = , > 8 didj, 

tx] jel 
where d; is the difference between the means of the 7th character in the two 
samples, and d; the difference between the means of the jth character, see {1| 
and [3]. This is called the Studentised D*-statistic. Two ways of obtaining the 
derivation of the distribution of this statistic are given below. 


First Mernuop. Let 
(5.3) R, = | Si; «/| Sei tt y = 2, Cae 


where, following Rao [2], the symbol | S;; |. is used to denote the determinant 
of order k of the matrix (S;;), obtained by giving 7, j the values 1, 2, --- , k, 
(k Ss p). The symbol | S;; |) may, without causing inconsistency, be defined 
to be equal to unity. Also let 


~ , y/ ' 
(5.4) Sij k Pe) kl» 


‘2 


where S’; is the corrected sum of products, that is, 


Si; = Yai — £) (2; — ¥)) 
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summed over all the (nm; + mn.) sample elements, Z; being the pooled means of 
° / ‘ 

both the samples. It is easy to show that Si; = S;; + ced;d;, whence 


| Si; + cdjd; |, 


(5.5 R, = — , 
, ») . Si + cd; d; k—1 


But 


k k 
(5.6) | Si; + edidj|y = | Sui |e (1 +e) Dd did; si’) = | Su|.e (1 + CD), 
1 1 


where (S;’) is the matrix inverse to (S;;),, and C = (n — 2)c. The quantities 
R, and (Ri — R,) when divided by Qo, are inde pendently distributed as Gamma 
variates with parameters 4(n — k — 1) and 3 respectively fork = 1, 2, ---, p, 
if the true means of the characters of the populations are equal. Hence R./Ri 
is a B,(4(n — k — 1), 4) variate. But by (5.3) and (5.5) in combination with (5.6), 


(5.7) R,/R, = (1 + CDi-1)/(1 + CD)), k= 1,2,---,p. 


It can be shown that PR, , Rp, --- , Re, Ry, are all independently distributed, 
as are also 


1+CD 1 1+ CD)-2 1+ CD; | 
1+CD,’ 14+ CDi.’ "1+ CD} 1+ CDi 
Now putting a = 4(p — 1), b = 3(n — p), d = 4 in Theorem 4, we find that, 
assuming C'D’,_; is a B2-variate with parameters 4(p — 1), 4(n — p), the quantity 
CD’, is a B.-variate with parameters }p, }(n — p — 1). It is well known that 
CD} is a B.-variate with parameters 3, 3(n — 2). Hence, by the principle of 
finite induction, CD, is a Bo-variate with parameters }p, }(n — p — 1), and 
therefore its distribution is 
(CD;)°’~”'* d(CD;) 
B(4p, (n — p — 1)) (1 + CDs)" 
which simplifies to 
Ca) a 
(5.8) ian oe 
B( 4p, 4(n — p — 1)) (1 + CDs)%-¥* 


Sreconp Meruop. Another method which is capable of yielding some additional 
results is the following. From (5.3) it easily follows that 
(5.9) S=1S8i;|p= Rp-Roi--: Ry. 
Similarly 
(5.10) S | Si; + cd; d;\, = . . R', rere 


Therefore 


‘ ) I —_ - 
a1) 5-1-1 (2-5-3), 


/ ‘ 
Ss i= p R ‘ ion p 2 
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employing the symbolic notation of Section 3. By Theorem 2, the last member 
of the above equation is seen to reduce to a 6,-variate with parameters 
s(n — p — 1), 3p, so thay (S/S’) follows the Beta distribution of the first kind 
with parameters 4(n — p — 1) and 3p. This result has been obtained by Wilks 
[4], by deriving expressions for the moments of the distribution of (S/S’). 
The above is a simple and direct method of establishing the distribution of the 
statistic (S/S’). ; 

From (5.6) it readily follows that (S/S’) is equal to 1/(1 + CD;), so that 
the latter is distributed as a 6;-variate with parameters 4(n — p — 1) and }$p, 
whence CD%, is a B:-variate with parameters }p and 3(n — p — 1), leading to the 
distribution as shown in (5.8). 
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————S 
ON SOME FUNCTIONS INVOLVING MILL’S RATIO! 


By D. F. Barrow anp A. C. Conen, Jr. 
University of Georgia 


1. Introduction and Summary. In this note, we prove that, for all (finite) 
values of h, 


mM» 1 — hA(Z — h) 
) ) os +o — ae 
(1) v(h) my, it<-h .’ 


is monotonic increasing , that 

9 9 2 

(2) 2m; — mz > OQ, 
and that 


(3) 1 < Ph) < 2, 
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where Z is the reciprocal of Mill’s ratio, 


(4) Ztih) =e - he e . * dt, 
A 


and where m, and m, are respectively the first and second moments of a singly 
truncated normal distribution about the point of truncation. 

The function ¥(h) arises in connection with maximum likelihood estimation of 
population parameters from singly truncated normal samples (cf. for example 
[1] and references cited therein). The inequality (2) arises in connection with 
three-moment estimates based on samples of the same type (cf. [2] and [3}). 


2. Some preliminary results. ‘l’o prove that y(h) is monotonic increasing, it 
is sufficient to establish that ¥/(h) > 0. Differentiating (4) gives 


(5) Z’ = Z(Z —h). 


Using this result and differentiating (1), we obtain 


(6) y'(h) = [hZ(Z — h)*’ — 3Z(Z — h) + 2)/(Z — hi)’. 


For subsequent use, it can be shown (cf. for example Sampford [4]) that 
fT Cts £4, lim Z’ =], lim Z’ = 0, 


h-w h~—e 


(8) (Z—h) > 0, lim (Z — h) = 0, lim (Z—h) = « 


h-- 2 h——@ 


’ 


(9) h(Z —h) <1, lim A(Z — h) = 1, lim h(Z —h) = —o. 


h-—@ h——e 


3. Proof that ¥/(h) > 0. Since from (8), (Z — h) > O, a sufficient condition 
that ¥/(h) > 0 is that 


O(h) = [|hAZ(Z — h) — 3Z(Z — h) + 2] > O. 
To prove this latter inequality, we first write @(h) in the form 
(10) 6(h) = ZiZ-— kh) + Z(Z — hy — B2(Z — kh) + 2. 


Using (7), (8), and (9), it can be shown that lim,..0(h) = 0. Therefore to prove 
6(h) > O, it would be sufficient though not necessary to show that 6’(h) < 0. 
Using (5), we find 


(11) @(h) = —Z(Z —h)’ —Z'(Z — h)’ + 27°(Z — hy’ — 5Z°(Z — h) +- 3:Z. 


Proof that @’(h) -« Q does not follow readily, so we introduce the auxil- 
lary function 


(12) g(h) = e*” 6(h), 


where 
ah 


(13) wh) = — | Z(zx) dx, 





MILL’S RATIO 


and thus 


(14) w'(h) = —Z(h). 


Since e*” > 0,a necessary and sufficient condition that 6(h) > 0 is that g(h) > 0. 
It can be shown that lim,..g(h) = 0, and consequently to prove @(h) > 0, it is 
sufficient to show that g’(h) < 0. 

On differentiating (12), we obtain 


(15) g'(h) = e°[@’(h) — Z0(h)). 
Again using the fact that e“’ > 0, it follows that g’(h) < 0 if and only if 
(16) a(h) — Ze(h) < 0. 
From (10) and (11), we have 
6'(h) — Zh) = —Z(Z —h)* 4+ ZZ — hy)’ — 27°(Z —h) +-Z 
Z\(Z(Z — h) — 1f — (Z —h)'} 
ZiZ(Z —h) —1—(Z —h) J} {Z(Z —h) —14+ (Z —h)’}, 
Zih(Z — h) — 1}4{(2Z — h\(Z — h) — 1}. 
Sampford (loc. cit.) proved’ 
(18) (2Z —h)\(Z —h) —1>0, for all finite h. 


From (4), Z > 0, and from (9), h(Z — h) — 1 < 0. Therefore @’(h) — Z@(h) < 0, 
and accordingly y’(h) > 0 for all finite h. With this result, the proof that y(h) 
is monotonic increasing, for all finite A, is complete. 


4. Proof that 2m; — m, > 0. As shown in [1], m, and my, may be expressed as 


(19) m, = o[Z — Al, m, = a [1 — hA(Z — h)|, 
and it follows that 
(20) 2m; — ms = a (2(Z — h)’ + h(Z — h) —- 1}. 


Since ¢ > 0, it is sufficient to demonstrate that the expression within brackets 
on the right side, above, is positive. After certain simplifications, we obtain 


[2(Z —h)? + A(Z —h) — 1] = [((Z — h)(2Z — 2h+h) — 1] 
= ((2Z — h)\(Z — h) — 1] > O, 
which is Sampford’s inequality (18), and the proof is complete. 
5. Proof that 1 < y(h) < 2. From (19), (5), and (7), it follows that 
Mm, — mi = ofl — Z(Z — h)|) = o (1 — Z’) > O. 


aia a ak . : ‘ wth 
* This inequality can also be established by employing the multiplier e 
similar to that in which it appears above 


in a role 
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Using this result and inequality (2), which was established in Section 4, we have 

m, < m, < 2m}, and the required result follows immediately on dividing by mj . 

We also note that lim,._.W(h) = 1, and lim,..¥(h) = 2. Thus no narrower 

limits can be found. To obtain these limits, we use the result, lim,...Z7/h = 0, 

« ] 
° : ‘ rp h2/2 2/9 — rr 

which follows from lim,._.Ze" ” = il . a| = (\/2r) ’. Thereby we have 
_ I/h?—Z/h+1 0-041 
lim y(h) = lim — a = 


apenas ee = |. 
h—+—w h-+—# (Z/h — 1)? (0 — 1)? 


and 


wth) l, 
. & 1 — h(Z — h)) 
lim y(h) = lim — so 
hon v h+a en(Z —_ h)? 
which is indeterminate of the form 0/0 as given. Using L’Hospital’s rule and 
making certain obvious simplifications, we obtain 


a) ] 


en 600) lien eee © 
Perk = STK es 


REFERENCES 

A. C. Conen, Jr., “estimating the mean and variance of normal populations from 
singly truncated and doubly truncated samples,’? Ann. Math. Stat., Vol. 21 
(1950), pp. 557-569. 

A. C. Conen, Jr., “On estimating the mean and variance of singly truncated normal 
distributions from the first three sample moments,’’ Ann. Inst. Stat. Math., 
Vol. 3 (1951), pp. 37-44 

A. C. Conen, Jr., “Estimation of parameters in truncated Pearson frequency distri 
butions,’’ Ann. Math. Stat., Vol. 22 (1951), pp. 256-265 

M. R. Samprorp, “Some inequalities on Mill’s ratio and related functions,’’ Ann 
Math. Stat., Vol. 24 (1953), pp. 130-132. 

Des Ras, “On estimating the parameters of normal populations from singly truncated 
samples,’’ Ganita, Vol. 3 (1952), pp. 41-57 


RR 


ABSTRACTS OF PAPERS 
(Abstracts of papers presented at the Ithaca meeting of the Institute, March 18-20, 1954) 


1. Confidence Region Procedures Based on the Logarithm of the Likelihood. 
Cari R. Ouman, Princeton University. 


Let f(x, 0) be a probability function where 4 is one of a set of permissible parameter 
points 6 = (@, , --+ , @,) contained in some subspace of R, . A sample (2 , --- , Zn) of size n 
is observed and a set of h functions, gj = (1/4/n) Li kijsLi ,j = 1, --- ,h <n, computed, 
where L; = @ log f/d0;, f = Ulvi f(a, 6), and the k;; are chosen so that E(y;) = 
0, E(giv;) = 5:; . For a given sample, the ¢; are functions of 6, and (¢(@), --- , ea(@)) isa 
point in the pivotal space @ C R,. If a region W can be constructed in ® so that 
Pr {(y , «+: , gn) © W} = a independently of 4 , the corresponding region in the parameter 
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space will be a 100a per cent confidence region for 6) . If the y; are normally distributed, 
the sphere W = {Ly; S  x’} is suitable. Otherwise, an appr» timate pivotal region can be 
constructed using one of several modified Cornish-Fisher procedures. The details of one 
such procedure are given and several examples are discussed. (This procedure differs some 
what from that described by M. 8. Bartlett, Biometrika, Vol. 40, (1953) pp. 12, 306.) The 
remainder of the paper discusses (i) the regularity conditions under which these procedures 
are valid, (ii) the large and small sample properties of the resulting regions, (iii) the possi- 
bility of improvement using higher order derivatives of f, and (iv) the problem of nuisance 
parameters, 


2. A One-Sided Confidence Interval for an Unknown Distribution Function. 
Hersert Rossins, Columbia University. 


Theorem. Let x , --- , 2, be independent with common continuous c.d.f. F(z), let F(x) 
be the sample e.d.f. = (number of 2; S x)/n, and let t be any constant between 0 and 1 
Then Pr [F (x) 2 tF,(2) for all — © <2< ©] = 1 —1t. Proof. By the usual transformation, 
x, = F(z;), the assertion need only be established when the z; are uniformly distributed on 
(0, 1]. In that case we have Pr [F,(z) S z/tforall— © <r2< |= 

tn 


n! Pr [zi S --- S 2, and az; 2 jl/n,j = 1,--: ,n] =n! fi f n—i)tin **° fit " Sti. dx, «++ dz, 
= n! ([x"/n! tz™—"/n!]i = 1 t 


3. The Mean Successive Difference in Samples from an Exponential Population. 
P. G. Moors, University College, London, and Princeton University. 


A random sample of size n is drawn from the exponential population having probability 
density function p(z) = @-'exp {— (4 — A)/@} forz 2 A andzeroelsewhere. Let x; , 2,°+- , 2p 
be the n observations in their correct temporal order. The mean successive difference is 
defined as d = Dfo! | a — 24: | /(n — 1). The first four moments of this expression are 
found in order to obtain approximate significance points for d. These may be used, if @ is 
known, to test the hypothesis of homogeneity in the original sequence of observations. The 
properties of d for the cases where A, or 6, is not constant from observation to observation 
but varies in some way are also investigated. Application of the foregoing suggests the use 
of the statistic A = d/% as a test for homogeneity which is independent of the population 
parameter @ in the case where A is known. In the final part of the paper, alternative statis 
tics based on | 2 — 24, | or | Aa; | for r = 2,3, --+ are discussed, and also the properties 
of the mean successive difference when the sampling is not from an exponential but the 
general x? or Pearson Type III population. 


4. Application of the Duality Theorem of Linear Programming to Testing Hy- 
potheses. Howarp Rarrra, Columbia University. 


Consider a finite sample space and finite parameter space. Let wo = |wo; , wor , +++ , wor! 
and w, = {wu , #12, °** , @} be disjoint subsets of the parameter space. For any random 
ized test,@, of wo against w, let a;(¢) = E(@ | wor) fori = 1,2, --- r,andBj(@) = E(1 — @ | w;) 
for j} = 1, 2, --- s. We apply the duality theorem of linear programming to find @ which, 
subject to the condition that a;(¢) S ao fori = 1, 2, +--+ r, is a minimizer of max, 8;(#) 
The results lead naturally to the notion of least favorable a priori distributions over wo 
and w, ; the results are interpreted geometrically. The value, ming max; 8;(¢) has an in- 
terpretation as a generalized distance in 2 from the origin to a displaced positive orthant. 
The distance is the shortest path constrained to follow a ray of a cone (which ray is associ 
ated with the notion of the least favorable case) and directions parallel to the axis. Existence 
results, approximation results based on gaps, and an algorithm for solving such problems 
are considered, 
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5. Multiple Points of Paths of Brownian Motion in the Plane. Anyen Dvorerzky, 
Columbia University; P. Erpés, Notre Dame University; and 8. KakuTant1, 
Yale University. 


The main result established here is that almost all two-dimensional (mathematical) 
Brownian motion paths in the plane possess multiple points of arbitrary high finite multi 
plicity. The method of proof is similar in part to that of a previous paper (Acta Sci. Math. 
Szegea., T. 12 (1950), pp. 75-81). Combining the results of the two papers it is known that, 
with probability 1: Brownian paths in four-dimensional or higher space have no double 
points; Brownian paths in three-dimensional space have double points; Brownian paths 
in two-dimensional space have points of arbitrary high finite multiplicity. The problems 
of the existence of multiple (in particular triple) points in three-dimensional Brownian 
paths and of points of infinite multiplicity in two-dimensional paths have not been settled 
yet. Another unsolved problem is that of the existence of points of uncountable multiplicity 
in one-dimensional Brownian motion 


On the Distribution of the Largest and Smallest Roots of a Matrix in Multi- 
variate Analysis. K. ©. 8. Pruar, University of North Carolina and Uni- 
versity of Travancore. 


This paper presents in a more convenient and usable form than before the general expres 
sion for the exact ¢.d.f. of the largest root (from which that of the smallest can be easily 
derived) of certain sample (p X p) matrices (positive definite or positive semi-definite with 
s non-null roots, s S p) arising in connection with different tests of hypotheses on p-variate 
normal populations. The exact ¢.d.f. is obtained for number of roots going up to eight 
(the expressions for s = 6, 7 and 8 being given for the first time). Approximations to the 
c.d.f. are given for number of roots up to five which are useful for computing percentage 
points (upper 5 per cent or less in the case of the largest root and lower 5 per cent or less 
in the case of the smallest root) for small integral values of one parameter connected with 
the sample. To illustrate the use of the approximations, exhaustive tables of upper 5 and 
1 per cent points for the largest root, in the case of two roots, have been computed; the 
error of approximation has been shown to be negligible. 


7. A Problem in Two-Stage Decision Theory. (Preliminary Report.) Morris 
SKIBINSKY, University of North Carolina. 


Let D,, be the class of two-stage decision rules with first sample, XY, of given size m, 
and second sample size which may depend on X. For a Bayes solution, this second sample 
size is given by the integral value of » which minimizes a certain function, G,(X). This paper 
is concerned with the case of independent normal observations having unit variance. It is 
required to decide between means 4 , 6; having arbitrary but fixed a priori probabilities 
(positive and adding to 1), where cost per observation is constant and the loss functions 
are simple. The nature of the second sample size function for a Bayes test in D,, is deter 
mined, and theorems which demonstrate its fundamental properties are proved. Results, 
leading to explicit formulas for this function, for the probabilities of wrong decisions, and 
for the expected size of the second sample, are obtained, by assuming the ratio, Z (= mini 
mum wrong decision loss, over cost per observation) to be large. Comparisons, analytically 
for large Z, and tabularly for certain fixed values of the parameters, are made with analogous 
one-stage and sequential procedures, in terms of error probabilities and expected simple 
size 
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8. Some Simple Sequential Tests and Estimates for Comparing Variances. 
ALLAN Brrnspaum, Columbia University. 


Let z = (% , Z,° ),y¥ = (mw, ¥,-*:*: ) be sequences of independent observations 
from normal distributions with means 0 and variances o: for each 2, , ¢, for yy. Let u = 
(Uy, Ue,°°s ) @ (Xi HF 22, Mat Me ,°°s ),v = (1, 2, °°) = Yi H+ Ya, Ys + Ye, *°*) 
Let Ri = Diu;, 8; = Ziv;. Let T = (7, , T2, +++ ) be the sequence of all R,’s and S,’s 
in increasing order. Let B = (b; ,be , --- ) where b; = lif T; = some R; ,b; = 0 if T; = some 
S;. Then B is a sequence of independent Bernoulli trials with p = Pr |bj = 1} = 
(1 + o:/o,)~. Sequential or nonsequential methods for tests and interval estimates for p 
give corresponding tests and estimates for o:/o; . In certain variance-components experi- 
ments, each replication generates a set of n, transformed observations 2; , n, transformed 
observations y; . Then by use of a guessed value of 0/0, a modified procedure will tend to 
utilize z,;’s and y;’s at the rate at which they are generated by successive replications. The 
method generalizes to give comparisons of 3 or more variances, with B a sequence of multi 
nomial observations. 


9. A Minima! Sequence of Statistics. R. R. Banapur, Columbia University 


Let x = (4 , 2% ,-+: ) be a sequence of real valued random variables, and suppose that 
z is distributed according to some unknown one of a given set P of probability measures p 
For each m, let Zim) = (1% , Ze, °** , 2m). It is assumed that for each m the set of possible 
distributions of x.) is dominated. For each m let y,, = T'n(2in)) be a statistic on the sample 
space of x). The sequence |y»j} is said to be sufficient if for each m, y» is a sufficient 
statistic for P when the sample point is 2(,) ; {ym} is said to be transitive if, for each » 
and each pin P, the conditional distribution of yn4: given 2,,) depends on the condition 
only through 7, . The author has shown elsewhere (‘Sufficiency and statistical decision 
functions,’”?’ Ann. Math. Stat., (1954)) that sequences which are sufficient and transitive 
play an important role in the reduction of sequential decision problems. It is shown in this 
note that there exists a sufficient and transitive sequence | such that, corresponding to 
any sequence |y,,| which is also sufficient and transitive, there exists a sequence F, , F; 
of functions such that, except on a set which is of p-measure zero for each p in P, yy = 
F (ym) for each m. The result has application to the problem of determining the maximum 
possible reduction of sequential decision problems by the principle of sufficiency. 


10. Strong Convergence of Stochastic Approximation Methods of Robbins- 
Monro and Wolfowitz-Kiefer. M. N. Guosu, University of North Carolina. 


The Robbins-Monro scheme of stochastic approximation of the root @ of the regression 
equation V(r) = a has been shown to converge strongly to @ under the following conditions: 
1) lim sup) (m(zr a)/(x 0)|<kas|x2|— o, and =. ly m(x)Pdi(y\ x2) S o*; 2) 
M(x) S a — (6) forz < 6 — 6, M(x) 2 a — o(6) for z > 6 + 5, where g(5) > 0; and 3) 
> 4, is divergent, D a, is convergent. The Wolfowitz-Kiefer process for estimating the 
maximum of the regression function has also been shown to be strongly convergent to 6, 
the point at which M(x) is maximum, under nearly same assumptions as in K and W except 


that we do not need M(z) to satisfy the Lipschitz condition, that is (2.8) in K and W 


11. An Ergodic Property of the Brownian Motion Process. (Preliminary Report.) 
Cyrus DermMan, Columbia University. 


Let X(t) 0 St < & bea one-dimensional separable Brownian Motion Process. The follow 
ing theorem is proved. If f(r) and g(x) are any real-valued Borel-measurable functions 
summable in the line 2 <2 < &, then with probability one 


lim T — f, f(a(t)) at/ fe g(a(t)) dt = f/g 
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provided that g ¥ 0 where f = f. 2 J(x) dz and gj = (2 g(x) dx. This theorem is a proba 
bility one version of a theorem proved by G. Kallianpur and H. Robbins, ‘‘Ergodic prop- 
erty of the Brownien motion process,” Proc. Nat. Acad. Sci., Vol. 39 (1953), pp. 525-533. A 
method first used by Doeblin (Bull. Soc. Math. France, 1938) and later exploited more fully 
by Chung (to appear in the Trans. Amer. Math. Soc.) was used to prove the theorem. 


12. On the Distribution of Hotelling’s Generalized T Test. K. C. S. Priwar 
University of North Carolina and University of Travancore. 


Let S* and S be two independent sample covariance matrices belonging to two p-variate 
normal populations with m and n degrees of freedom respectively. Hotelling defines a meas 
ure of multivariate dispersion 7 , given by T>/m = trace S“S*. According to this definition 
S and S* are positive definite (p X p) matrices. Assume that S* can also be positive semi- 
definite with s nonnull characteristic roots, s S p, and denote by U“ the trace 
of (m/n)S“'S*. Establishing certain recurrence relations between the moment generating 
functions of U“ and U“-®, the lower order moments of U“ are obtained. These suggest 
an approximation to the p.d.f. of U™ in the form of an F distribution, where U“)/s is dis 
tributed as »,/'/v2 with », = s(2m’ + s + 1) and v, = 2(sn’ + 1), m’ and n’ being functions 
of m, n and s. For s = 1, the approximate p.d.f. reduces to that of Hotelling’s 7' and is 
exact. For s = 2, the accuracy of approximation has been discussed by comparison with the 
exact c.d.f. obtained by Hotelling. The approximate distribution guarantees sufficient 
accuracy for practical use. 


13. Power and Sample Size for Small Samples on Testing Hypotheses Con- 
cerning a Bernouilli Variable. Howarp Rairra, Columbia University. 


The problem of testing a simple hypothesis versus a single alternative concerning the 
parameter p of a Bernouilli variable is considered. By example, it is shown that if the 
number of successes is used as the sample point: a) a decision rule which is admissible among 
the nonrandomized rules is not necessarily admissible; b) if we confine ourselves to non- 
randomized strategies then, for a given significance level, increasing the sample size might 
decrease the power of the most powerful test; and c) among the class of randomized tests, 
for a given significance level, increasing the sample size does not necessarily increase the 
power of the most powerful test. Assertion ¢), which is not generally realized, is illustrated 
for the case where the type II error is not zero. This is easily explained by returning to the 
sample space comprising sequences of successes and failures. Emphasis is laid on the fact 
that the desirability of increasing power by increasing sample size is intimately related to 
the actual significance level of the tests. Attention is drawn to similar examples involving 
the multinomial distribution 


(Abstracts of papers presented at the Gainesville meeting of the Institute, March 18, 1954 


14. On the Central Limit Theorem for m-Dependent Variables. P. H. Diananpa, 
University of North Carolina and University of Malaya. 

Let X, , X2, +++ be a sequence of m-dependent random variables with zero means and 
finite variances. Let S, = X,; + --- + X, ands, = ¥ E(S’,). Suppose that,asn— « 
(1) lim inf (s,/n) > 0, (2) lim sup E(X%) < ©, and (3) for every fixed « > 0, 

8a Lint f z|>es, 2? dF, (xz) — O, 
where F;(z) is the distribution function of X; (¢ = 1, 2, --- Then S,/s, is asymptotically 
distributed as a standardized normal variable. If, further, F;(x) is independent of 
i (i = 1,2, --- ) then (2) and (3) are automatically satisfied and (1) is a sufficient condition 
for the result. These results and their analogues in the vector case generalize results given 
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by the author in an earlier paper, ‘“The central limit theorem for m-dependent variables 
asymptotically stationary to second order,’’ (to appear in Proc. Camb. Phil. Soc.). A 
property of joint distribution functions of m-dependent variables with finite variances, 
given in the earlier paper, is used in an improved form in proving the results of this paper 


15. On a Property of a Class of Decision Procedures for Ranking Means of 
Normal Populations. (Preliminary Report.) K. C. Seat, University of Nort! 
Carolina. 


Suppose there are (n + 1) normal populations N (u; ,o*),i = 0,1,2, --- ,n, with unknowr 
means and a common but unknown variance, and that one random observation from eacr 
of these (n + 1) populations is given. It is desired to choose the smallest group of popula- 
tions which includes the population with greatest mean. Suppose an estimate *? of o is 
known which is independent of the given observations z; (i = 0, 1,--- , n). Let 1 a 
(0 < @ < 1) be the g.l.b. of the desired probability of correct choice, whatever may be 
ui’s (i = 0,1, --- ,n). The class (corresponding to different sets of c's) of decision rules given 
below has the property that probability of incorrect choice never exceeds that of correct 
choice. Let (3° °°" (ec; 2 0; i = 1, --- , n; Diy cz: = 1) denote the upper a per cent point 


in the p.d-f. of (8°18 "°° = [EP exyci) — yo] /s, where y: (i = 0,1, --- ,n) are (n+ 1) random 


8 
observations from N(O, o?) and ya) S ye: S --: S you are n ranked observations among 
Yi, °° » Yn. The class of decision rules is defined as follows: ‘‘Reject any observation 2» 
from the given observations z; (i = 0,1,--- , n) if Dhicwa — 2% 2 at.*°* ‘Wie, = 0; 
1, +++ ,m; Df ec; = 1), and accept otherwise; z,;) stands for the ith ranked observation 


among z;’s (i = 1, --+ , n). Proceed as above for each of (n + 1) observations separately.”’ 


i= 


Other properties of this class of decision procedures and the selection of an optimum rule 
from this class are under investigation 


16. Simultaneous Confidence Bounds on Canonical Regressions. 8. N. Roy, 
University of North Carolina. 


In an earlier paper (‘‘Simultaneous confidence interval estimation’ by S. N. Roy and 
R. C. Bose, Ann. Math. Stat., Vol. 24 (1953), pp. 513-536) simultaneous confidence bounds 
on canonical regression coefficients were given with an exact joint confidence coefficient 
The confidence statement itself was, however, quite complicated and not of much direct 
physical use. The present paper uses a technique recently developed by the author (and 
reported at the last meetings of the Institute of Mathematical Statistics) to obtain a set of 
confidence bounds, much simpler and physically more usable, but with a joint confidence 
coefficient grester than or equal to a pre-assigned level, the level being one that is also 
actually attained 


17. A New Test of Compound Symmetry. 8. N. Roy, University of 
North Carolina. 


It is well known that if z; and z. have a bivariate normal distribution with variances 
o; and a> and correlation coefficient p, then x, + 2: and x; — 2» has zero correlation pro- 
vided that a; = o, . It is also well known that this fact and the central distribution of the 
correlation coefficient are used to test the hypothesis 0; = o2 , which is the hypothesis of 
compound symmetry for a bivariate normal population. For an N(£, 2), where £ is p X 1 
and = is p X p, the corresponding hypothesis is that all the diagonal elements of © are 
equal, and so also all the nondiagonal ones. Starting from the bivariate compound sym 
metry test and using a technique discussed in an earlier paper (‘‘On a heuristic method of 
test construction and its use in multivariate analysis’? by 8. N. Roy, Ann. Math. Stat., 
Vol. 24 (1953), pp. 220-238) a test of this hypothesis is obtained in terms of the largest 
characteristic root of a matrix and its distribution. 
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18. A 2 X 2 Factorial with Paired Comparisons. Ropert M. ABELSON, AND 
Ratpn ALLAN Brapbwey, Virginia Polytechnic Institute. 

The parameters previously specified for a method of paired comparisons are redefined in 
such a way as to permit the use of treatments in factorial array. The algebraic procedure 
is shown in general but the normal equations resulting from the use of maximum lieklihood 
are nonlinear and difficult to solve. Easy solution of the normal equations seems to be 
limited to the 2 X 2 factorial and an explicit solution is given for that case. The method of 
paired comparisons presented for 2 X 2 factorial treatments permits most of the compari 
sons available through usual analysis of variance. It is possible to test for the presence of 
both main effects and their interaction. A numerical example is included. 


19. On Wald’s Confidence Interval for the Ratio of Variances in a Variance 
Components Model. W. A. Tuompson, Jr., Virginia Polytechnic Institute 
and University of North Carolina. 

Wald’s confidence interval (‘‘A note on regression analysis,’’ Ann. Math. Stat., Vol. 18 
(1947), p. 586) is specialized to the case of incomplete block designs with random block 
effects. A theorem concerning the multiplicity of the characteristic roots of the variance 
covariance matrix of the adjusted yields is discussed and applied to Wald’s confidence 
interval. A practical example is discussed. This work was done under contracts with the 
Air Force and the Quartermaster Corps 


a a 


NEWS AND NOTICES 


Readers are invited to submit to the Secretary of the Institute news items of interest 
Personal Items 

Paul M. Blunk has accepted the position of Operations Analyst with the Con- 
solidated Vultee Aircraft Corporation at Fort Worth, Texas. 

Dr. R. 8. Burlington, Chief Mathematician of the Bureau of Ordnance, Navy 
Department, and head of the Evaluation and Analysis Group of the Bureau of 
the Ordnance, has been named Special Assistant to the Director of Research 
and Development, Bureau of Ordnance, Navy Department, Washington, D. C. 

Visiting Associate Professor Kai Lai Chung of Cornell University has been 
appointed Associate Professor at Syracuse University. He is in charge of an 
ARDC Research project on probability and statistics there. 

Charles W. Dunnett, formerly Biometrician for the Food and Drug Labora- 
tory, Ottawa, Canada, is now on the statistical staff of the Lederle Laboratories 
Division of the American Cyanamid Company located in Pearl River, New York. 

Edward A. Fay, formerly a graduate student at the University of California, 
has been employed since September 1950 as a statistician with the United States 
Naval Ordnance Test Station, China Lake, California. 

Professor E. J. Gumbel, Columbia University, has been appointed Visiting 
Professor for Statistics at the Free University, Berlin (West) for the summer 
term 1954. Professor Gumbel has also been elected a member of the International 
Statistical Institute at The Hague. 


Stuart T. Hadden, formerly Chemical Engineer with the Research & Develop- 
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ment Division of Socony-Vacuum Oil Company, Inc. in Paulsboro, New Jersey, 
has joined the Research Division of New York University as Research Associate 
in the Chemical Engineering Department and is taking graduate courses in 
Chemical Engineering and Mathematics. 

Daniel R. Embody is now employed as a Biometrician in the Division of 
Analytical and Physical Chemistry of the Squibb Institute, New Brunswick, 
New Jersey. 

Dr. J. F. Hannan, formerly of the Catholic University of America, has ac- 
cepted an assistant professorship at Michigan State College, East Lansing, 
Michigan. 

Clifford Hildreth, formerly Associate Professor in the Cowles Commission, 
University of Chicago, has accepted a position as Professor of Agricultural Eco- 
nomics, North Carolina State College, Raleigh, North Carolina. 

William C. James of the United States Embassy has been transferred from 
Lima, Peru and expects to be stationed in Caracas, Venezuela for the next two 
years, 

Dr. Gopinath B. Kallianpur returned to India in July 1953 and has joined 
the Indian Statistical Institute in Calcutta. On his way home he visited the 
Statistical Laboratory at Cambridge and the Institut Henri Poincaré in Paris. 

E. P. King is now employed as a Consulting Statistician on the staff of the 
Control Division of Eli Lilly and Company. 

Boyd Ladd, formerly with the Bureau of the Budget, Office of Statistical 
Standards, as a specialist in Statistical Methods and in Transportation Statistics, 
has joined the Operations Research Office, Johns Hopkins University where he 
is doing operations analysis in logistics problems. 

Eugene Lukacs, formerly with the National Bureau of Standards, is now 
serving as head of the Statistics Branch, Office of Naval Research. 

J. W. Mayne, formerly employed with the Defence Research Board as Chief 
of the Statistical Analysis Section and of The Northern Operational Research 
Section of the Operational Research Group, on November 1, 1953 was posted to 
HQ Western Command and HQ Tactical Air Command as Senior Operational 
tesearch Officer of the Joint Services Operational Research Team, Edmonton, 
Alberta. 

O. B. Moan has accepted the position of Staff Quality Control Engineer on 
the Director’s Staff of Hughes Aircraft Company, Culver City, California. 

Cordell B. Moore, formerly instructor of mathematics and graduate student 
at the University of Kentucky, Lexington, Kentucky, is now employed as mathe- 
matical statistician (Senior Aerophysics Engineer), Consolidated Vultee Aircraft 
Corporation, Fort Worth Division, Fort Worth, Texas. 

James Pachares has accepted a position as statistician at the Naval Air Missile 
Test Center, Point Mugu, California. 

Professor Raymond P. Peterson, Jr., of the University of Washington has 
accepted the position of Assistant Professor and Head of the Mathematics 
Department, University of California, Riverside California. 





116 NEWS AND NOTICES 


Dr. L. J. Reed, who has recently retired from his position as Vice-President 
of the Johns Hopkins University and Hospital, has been appointed President of 
the University. 

A. T. Reid, formerly Research Assistant, Committee on Mathematical Biology, 
University of Chicago, is now a Graduate Student and Research Assistant in 
the Department of Mathematical Statistics, Columbia University. 

I. M. Sahni has accepted a position as Senior Research Officer and Private 
Secretary to the Deputy Chairman of the Planning Commission, New Delhi, 
India. 

Professor Arnold J. F. Siegert has received a Guggenheim Fellowship and a 
leave from Northwestern University and is at the Institute for Advanced Study, 
Princeton, New Jersey during the current academic year. 

George W. Snedecor has accepted an appointment to serve as professor in 
experimental statistics winter quarter 1954 at Alabama Polytechnic Institute, 
Auburn, Alabama. He will advise and consult with the research staff on experi- 
mental design and statistics. This work is part of a cooperative program of 
statistics among the southeastern states sponsored by the Institute of Statistics 
of the Consolidated University of North Carolina and by the General Education 
Board. For the spring quarter, Professor.Snedecor will return to the Department 
of Statistics, Iowa State College. 


( 


Department of Statistics, University of Connecticut, Storrs, Connecticut 


The University of Connecticut has established a Department of Statistics in 
the College of Arts and Sciences, beginning spring semester, 1954. The depart- 
ment will be under the direction of Dr. Geoffrey Beall, Biometrician and Pro- 
fessor of Statistics of the University. Courses in general and theoretical statistics 
will be concentrated in the new department but specialized applicational courses 
will continue to be given in other parts of the University organization. 


en 


New Members 


The following persons have been elected to membership in the Institute 
November 17, 1953 to February 9, 1954 

Abramson, Lee R., Student, Columbia University, 315 E. 206th Street, New York 67, New 
York. 

Adu, Anthony A., B.S. (Roosevelt College), Technical Statistician, Cowles Commission for 
Research in Economics, University of Chicago, Chicago 37, Illinois, 243 International 
House, 1414 East 59th Street, Chicago 37, Illinois. 

Berndt, Gerald D., M.S. (New York Univ.), Meteorologist, Quartermaster Research & 
Development Center, Department of the Army, 42 Fenelon Road, Framingham, Massa 
chusetts 

Brickley, Robert L., M.S. (Purdue Univ Statistician, Materials Engineering Division, 
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Westinghouse Electric Corporation, Bor 230 Lincoln Hwy., East McKeesport, Penn 
sylvania. 

Bright, Harold F., Ph.D. (Univ. of Texas), Chief, Technical Services, Human Resources 
Research Office, The George Washington University, Box 3596, Washington 7, D. C., 
Route 4, Box 648, Old Braddock Road, Fairfax, Virginia. 

Brooks, Samuel H., B.S. (Univ. of Maryland), Student, Department of Biostatistics, Johns 
Hopkins School of Hygiene, 615 N. Wolfe Street, Baltimore 5, Maryland. 

Buckland, William R., Ph.D. (London), Senior Executive Assistant, Statistical Section, 
Development & Research Department, London Transport Executive, 55 Broadway, 
Westminster, London 8. W. 1, England. 

Cargo, Gerald T., M.S. (Univ. of Michigan), Private in the Army, Surveillance Labora 
tory, Ballistics Research Laboratory, Aberdeen Proving Ground, Maryland, U.S 
§2346636 , 9301 TSU (Ord) BRL Det., Aberdeen Proving Ground, Maryland 

Demen, Theodosius L. (Rev.), B.S. (The Theological Academy of Zirc, Hungary), Graduate 
Student, Department of Mathematics and Assistant, Physics Department, Marquette 
University, 1639 N. 5th Street, Milwaukee, Wisconsin. 

Duffett, James R., M.S. (Univ. of Florida), Mathematical Statistician, OCO Guided Missile 
Reliability Group, Los Angeles Ordnance District, 55 South Grand Avenue, Pasadena 2, 
California. 

Flanagan, Joseph E., Ph.D. (Univ. of Illinois), Assistant Professor of Mathematics, Mathe 
matics Department, Carnegie Institute of Technology, Pittsburgh 13, Pennsylvania 

Goodman, Nathaniel R., M.S. (Illinois Inst. of Tech.), Student, Princeton University, 
Room 611, 1908 Hall, Princeton University, Princeton, New Jersey. 

Hook, Lee H., B.S. Ch.E. (Purdue Univ.), Teaching Assistant and Student, 706 Delaware 
Street, S.E., Minneapolis 14, Minnesota. 

Merrill, W. Jay, Jr., A.B. (Univ. of Delaware), Statistictag, Development Planning Group, 
IBM Engineering Development Center, Endicott, New York, 58 Grand Boulevard, 
Binghamton, New York. 


Morrison, Donald F., B.S. (Boston Univ.), Graduate Assistant, Mathematics Department, 
College of Liberal Arts, Boston University, 725 Commonwealth Avenue, Boston 15 


Massachusetts, 64 Boardman Avenue, Melrose 76, Massachusetts 

Ney, Peter E., M.A. (Columbia Univ.), Assistant and Student, Mathematical Statistics, 
Columbia University, 5410 Netherland Avenue, New York 71, New York 

Odeh, Robert E., B.S. (Carnegie Inst. of Tech.), Graduate Student, Mathematical Sta 
tistics, Department of Mathematics, Carnegie Institute of Technology, Pittsburgh 13, 
Pennsylvania, 5054 Forbes Street, Pittsburgh 18, Pennsylvania. 

Perrault, William E., M.S. (Univ. of Michigan), Graduate Student, St. Louis University, 
12387 South 14th Street, St. Louis, Missouri. 

Roach, W. L., M.A. (Univ. of Oregon), Graduate Assistant, Department of Mathematics, 
University of Oregon, Eugene, Oregon. 

Smolak, Jean F., B.S. (Univ. of Michigan), Senior Statistician, Biological Control, E. R 
Squibb & Sons, New Brunswick, New Jersey, 40 West Spring Street, Somerville, New 
Jersey. 

Summers, J. F., M.A. (Univ. of Texas), Machine Methods and Procedures Analyst, The 
Texas Company, Producing Department, Houston, Texas, The Texas Company, P. O 
Box 2332, Houston 1, Texas. 

Teng, Lincoln C., M.S. (Univ. of Michigan), Graduate Student in Mathematics, Massachu 
setts Institute of Technology, 282 Massachusetts Avenue, Cambridge, Massachusetts 
Turner, Malcolm E., Jr., B.A. (Duke Univ.), Graduate Research Assistant, Department of 

Biostatistics, School of Public Health, University of North Carolina, 3517 East Oak 
Drive, Durham, North Carolina. 
Ullrich, Egon, Ph.D. (Universitit Graz), ord. Professor (Mathematik), Direktor des Mathe 





118 ITHACA MEETING 


matik Instituts der Justus-Liebig-Hochschule (Universitat) Giessen, Johannesstr. | 
(Hessen, Bundesrepublik Deutschland 

Wilkinson, John W., M.A. (Queen’s Univ., Kingston, Canada), Graduate Student, Institute 
of Statistics, University of North Carolina, 396 McCauley Street, Chapel Hill, North 
Carolina 

Wine, R. Lowell, M.A. (Univ. of Virginia), Student, Virginia Polytechnic Institute, Route 
One, Box 311, Roanoke, Virginia 


mR 


REPORT OF THE ITHACA MEETING OF THE INSTITUTE 


The fifty-ninth meeting of the Institute of Mathematical Statistics was held 
in Ithaca, New York on March 18-20, 1954. Among the members of the Institute 
who attended were: 


R. R. Bahadur, Allan Birnbaum, Robert Bechhofer, Julius Blum, Isadore Blumen, 
Kai Lai Chung, Louis Cote, Edwin L. Cox, Cuthbert Daniel, Reed B. Dawson, Jr., Cyrus 
Derman, T. G. Donnelly, J. E. Dowd, Rudolf Drenick, Charles Dunnett, A. Dvoretzky, 
W. T. Federer, Albert Folop, M. N. Ghosh, John E. Freund, Frances Hobson, J. FE. Jackson, 
Mark Kac, Leo Katz, Jack Kiefer, H. G. Landau, 8. B. Littauer, Stuart P. Lloyd, Eugene 
Lukacs, Philip J. McCarthy, P. G. Moore, Norman Morse, R. B. Murphy, M. J. Netzorg, 
Gottfried Noether, Carl R. Ohman, Emanuel Parzen, Howard Raiffa, Herbert Robbins 
Douglas 8. Robson, Jerome Sacks, Richard Schwartz, Arnold Siegert, Morris Skibinsky, 
Milton Sobel, R. G. D. Steel, Henry Teicher, Jacob Wolfowitz 


The program of the meeting was as follows. 


THURSDAY, MARCH 18, 1954 
9:00 a.m. teyvistration 


10:30 a.m, Problems of Randomness 
Chairman: RK. EF. Bechhofer, Cornell University 
Papers: 1. Random Processes in Physics, Arnold J. F. Siegert, The Institute for 
Advanced Study, Princeton, N. J. (on leave from Northwestern Uni 
versity ) 
2. Some Sequential Tests of Randomness, Gottfried E. Noether, Boston 
University. 


1:30 p.m. Contributed Papers I 
Chairman: P. J. MeCarthy, Cornell University 
Papers 1. Confidence Region Procedures Based on the Logarithm of the Likelihood, 
Carl R. Ohman, Princeton University 
2. A One-Sided Confidence Interval for an Unknown Distribution Function, 
Herbert FE. Robbins, Columbia University. 
The Mean Successive Difference in Samples from an Exponential Popu 
lation, P. G. Moore, University College, London, and Princeton Uni 
versity. 
. Application of the Duality Theorem of Linear Programming to Testing 
Hypotheses, Howard Raiffa, Columbia University. 
Vultiple Points of Paths of Brownian Motion in the Plane. (By Title). A 
Dvoretzky, Columbia University, P. Erdos, Notre Dame University, and 
S. Kakutani, Yale University 





3:00 p.m. 
Chairman 


Papers: 


9:00 a.m. 
Chairman 


Papers 


Discussant 


11:00 a.m. 
Chairman: 
Papers 
Discussant 

1:30 p.m 


Chairman 
Papers: 


3:00 p.m. 
Chairman: 
Papers 


9:00 a.m 
Chairman 
Papers 
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6. On the Distribution of the Largest and Smallest Roots of a Matrix in Multi- 

variale Analysis. (By Title). K. C. 8. Pillai, University of North Carllina 
and University of Travancore. 


Applications of Stochastic Processes 
Isadore Blumen, Cornell University 
the and the 
Eugene Lukaes, Office of Naval Research. 


1. Remarks on Wiener Process Estimation of its Parameters, 


2. On Some Problems in Stochastic Processes, Mark Kac, Cornell University. 


FRIDAY, MARCH 19, 1954 


Recent Developments in Stochastic Approximation 

Leo Katz, Michigan State College 

1. Stochastic and Nonstochastic Approximation Methods, Jack Kiefer, Cornell 
University. 

2. Strong Consistency of Stochastic Approximation Methods, Julius Blum, 
Indiana University. 

3. Asymptotic Properties of the Robbins-Monro Procedure, 
Syracuse University 

Herbert E 


Kai Lai Chung, 


Robbins, Columbia University 


Biological Sampling 
Walter T 


1. Tag-recapture methods, Udwin L. Cox, Case Institute of Technology 


Federer, Cornell University 


Douglas 8. Robson, Cornell University 


Contributed Papers II. 

Rh. G. D. Steel, Cornell University 

1. A Problem in Two-stage Decision Theory, (Preliminary Report 
Skibinsky 


2. Some Simple Sequential Tests and Estimates for Comparing Variances, 


Morris 
University of North Carolina 


Allan Birnbaum, Columbia University 
A Minimal Sequence of Statistics, R. R 
Strong Convergence 


Bahadur, Columbia University. 
Vethods of Robbins, 
Monro, Wolfowitz and Kiefer, M. N. Ghosh, University of North Carolina 
(Introduced by B. G. Greenberg). 


of Stochastic Approximation 


. On an Ergodic Property of the Brownian Motion Process, Cyrus Derman, 
Columbia University 

i. On the Distribution of Hotelling’s Generalized T Test. (By Title). K. C.8 
Pillai, University of North Carolina and University of Travancore 


Power and Sample Size for Small Samples on Testing Hypotheses Con 
cerning a Bernoulli Variable. (By Title). Howard Raiffa, Columbia Uni- 
versity. 


Problems in Applied Probability 
Milton Sobel 
1. Estimation by the 


Cornell University 


Distance Method, 


Vinimum Henry Teicher Purdue 
University 
2. The Theory of Queues, J Wolfowitz, Cornell University 


SATURDAY, MARCH 20, 1954 


Applications of Statistics to Industrial Problems 
J. kdward Jackson, Kastman Kodak 
cs 


Vulliple Comparisons Test for Comparing Several Treatments with a 
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Control, Charles W. Dunnett, Lederle Laboratories Division, American 
Cyanimid Co 
Industrial Uses of Fractional Replication, Cuthbert Daniel, Engineer- 
ing Statistician. 

Discussants 8. B. Littauer, Columbia University 
R. B. Murphy, Bell Telephone Laboratories 


ISADORE BLUMEN 
Assistant Secretary 


RR 


REPORT OF THE GAINESVILLE MEETING OF THE INSTITUTE OF 
MATHEMATICAL STATISTICS 


The 60th meeting of the Institute of Mathematical Statistics was held jointly 
with the Biometrics Society, ENAR, on the campus of the University of Florida, 
Gainesville, Florida on Thursday, March 18, 1954. 

The following 26 members of the Institute attended the meeting: 


R. A. Bradley, A. C. Cohen, Gertrude Cox, Lee Crump, J. H. Curtiss, James R. Duffett, 
David B. Duncan, G. L. Edgett, R. J. Hader, G. Ronald Herd, Walter W. Hoy, J. 8. Hunter, 
M. G. Kendall, Boyd Ladd, Herbert A. Meyer, Dayle D. Rippe, 8. N. Roy, K. C. Seal, 
G. W. Snedecor, Paul N. Somerville, D. E. South, Melvin G. Springer, John W. Tukey, 
M. C.K. Tweedie, Harry Weingarten, and John Woodward 


On Thursday morning Dr. Linton FE. Grinter, Dean of the Graduate School, 
University of Florida, introduced by Dr. Herbert A. Meyer, welcomed the 
members to the campus. Professor D. E. South, also of the University of Florida, 
presided at the opening session at which there were approximately 70 present. 
The following three papers on truncation problems and applications were pre- 
sented: Some Recent Advances Concerning Estimation from Truncated and Cen- 
sored Samples, By A. C. Cohen, Jr., University of Georgia; Discussion 
and Applications of Truncation Theory, I, by J. R. Duffett, White Sands Proving 
Grounds; Discussion and Applications of Truncation Theory, II, by John Wood- 
ward, Ballistic Research Laboratories, Aberdeen Proving Ground. 

At the second morning session, with Professor G. L. Edgett, Virginia Poly- 


technic Institute and Queens University in the chair, contributed papers, the 
abstracts of which appear elsewhere in this issue, were presented as follows: 


1. On the Central Limit Theorem for m-Dependent Variables. P. H. Diananda, University 
of North Carolina and University of Malaya. (Introduced by H. Hotelling). 

2. On a Property of a Class of Decision Procedures for Ranking Means of Normal Popula 
tions. (Preliminary Report). K. C. Seal, University of North Carolina. 
Simultaneous Confidence Bounds on Canonical Regressions. 8. N. Roy, University of 
North Carolina 

4. A New Test of Compound Symmetry. (By title). 8. N. Roy, University of North Carolina. 


A session on Quantitative Genetics with Professor G. W. Snedecor of Iowa 
State College presiding was conducted on Thursday afternoon at which the 
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following papers were presented: A Model for the Study of Quantitative Inheritance, 
by Virgil Anderson, Purdue University; and An Extension of the Concept of 
Partitioning Hereditary Variance for Analysis of Covariances Among Relatives 
When Epistasis is Present, by C. Clark Cockerham, North Carolina State College. 

At a mid-afternoon session on The Training of Statisticians in the South, with 
Professor Herbert A. Meyer, University of Florida presiding, the following two 
papers were given: Report of the Committee of the Institute of Statistics on the 
Training of Statisticians, by G. E. Nicholson, Jr., University of North Carolina; 
and Plans and Objectives of the Southern Regional Graduate Summer Sessions 
in Statistics in 1954, by Ralph A. Bradley, Virginia Polytechnic Institute. 

Contributed papers were presented at a late afternoon session at which Pro- 
fessor P. N. Somerville of Virginia Polytechnic Institute presided. The abstracts 
of those papers as listed are also printed in this issue: 


1. A2 X 2 Factorial with Paired Comparisons. Robert M. Abelson and Ralph Allan Brad- 
ley, Virginia Polytechnic Institute. (Presented by Professor Bradley.) 

2. On Wald’s Confidence Interval for the Ratio of Variances in a Variance Components 
Model. W. A. Thompson, Jr. Virginia Polytechnic Institute 


On Thursday evening the members were honored by a reception held in the 
Social Room of the Florida Union. 


Herpert A. MEYER 
Assistant Secretary 
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PUBLICATIONS RECEIVED 


BvLANCc-LAPIERRE, A. AND R. Forrer, Théorie des Fonctions Aléatoires, Masson and Cie, 
Paris, 1953, xvi + 693 pp., 6.500 frances. 

BuRINGTON, RicHarbD STEVENS AND DonaLp Curtis May, Jr., Handbook of Probability and 
Statistics with Tables, Handbook Publishers, Inc., Sandusky, Ohio, 1953, ix + 332 pp. 

Davin, F. N. anp M. G. KEenpvALL, ‘“Tables of Symmetric Functions, Part LV,’’ Biometrika, 
Statistical Table No. 18, Cambridge University Press, 1953, 20 pp., 4 shillings. 

Econométrie, Vol. XL, Colloques Internationaux du Centre National de la Recherche Scien- 
tifique, Paris, 1953, v + 334 pp 

Latscua, R., ‘‘Tests of Significance in a 2 X 2 Contingency Table: Extension of Finney’s 
Table,’’ Biometrika, Statistical Table No. 17, Cambridge University Press, 1953, 13 
pp., 2 shillings and sixpence. 

Srevens, W. L., ‘Tables of the Angular Transformation,’’ Biometrika, Statistical Table 
No. 16, Cambridge University Press, 1953, 4 pp., 1 shilling. 
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